self driving technology

Technology review – How do self-driving car see?

From the very first sketch of Leonardo Da Vinci’s self-propelled cart in 1478 up to the present day’s self-driving vehicles by leading companies like General Motors, Audi,Volkswagen Group, Toyota, Volvo, Tesla, Uber, Microsoft, Honda, etc., autonomous vehicle technology is constantly improving through numerous test drives to deliver driverless experience most especially enhanced safety on roads. According to research, road traffic accident is considered to be the 10th leading causes of death worldwide estimating up to 1.2 million people killed in road crashes each year.

We are no longer seeing the technology as a sci-fi visions from movies like Minority Report, the future is happening NOW.

Imagine concentrating on other tasks while your car autonomously drives itself from point to point.

In order for these type of vehicles to drive better than humans, they must have the clear capability to see well.

self driving car sensors

Source: Popular Science

Tl; dr;

Self-driving vehicles are fully edging towards reality. The technology relies on Global Positioning System (GPS) as well as advanced sensing systems that can detect lane boundaries, signs and signals, and avoid obstacles.Experts predict that self-driving cars will dominantly hit the road by 2020.

Employing these vehicles on the public road will improve road safety and can save thousands of lives. According to the US National Highway Traffic Safety Administration, 90% of all traffic fatalities are due to human error.

In order to drive well than humans, how do self-driving vehicles see and perceive the environment around compare to what humans normally see?

Self-driving cars ‘superhuman’ vision

car sensors description

Source: ArcInsight Partners

From mapping and localization, obstacle avoidance, to path planning, autonomous vehicles have so much to offer. They have a deliberative architecture to make intelligent decisions, and is completely equipped with a GPS unit, an inertial navigation system, and a range of sensors including laser rangefinders, radar, and video, and able to build a 3D image of its environment. All of these technological features initiates a smooth navigation along the road.

Waymo’s 360 degree demo ride

Source: Engadget

How self-driving vehicles visualize?


Through machine learning and AI (computer vision), autonomous vehicles have input devices like cameras, radar, and lasers to perceive the world around them by creating a digital map.

Object Detection.

Autonomous vehicles classifies this process in two parts, first is by image classification and next is image localization.

Through image classification, the vehicle determines what objects in the image it sees (i.e. car, person, traffic lights and pedestrians), it utilizes a Convolutional Neural Network (CNN).

But with CNN, it can only classify images with a single object taking up a sizable portion.

In order to solve this issue we can use sliding windows algorithm.

sliding window algorithm

Source: Towards Data Science

But what if an object is a lot larger or smaller than the window size? It would NOT be DETECTED.

So there’s a need for multiple window sizes and slide them over the image.

That’s why coming in another algorithm called YOLO.

YOLO, splits up an entire image into a grid and running the entire image through a convolutional neural network, ending up with a class probability map. Certainly gives probabilities for each cell being a “specific object.”

YOLO algorithm

Source: Towards Data Science

Then for image localization, it provides specific location of these objects, represented by boxes.

In determining the object’s localization, it uses an algorithm called non-max suppression.

Comparing bounding box results from the CNN to the actual bounding box, the cost function is the area of intersection divided by the area of union of the two bounding boxes. The closer this number, also called IoU (intersection over union), is to one (1), the better our prediction is.

IoU calculation

Source: Towards Data Science

While training the network, bounding box results are compared from the CNN to the actual bounding box. The cost function is the area of intersection divided by the area of union of the two bounding boxes. The closer this number, also called IoU (intersection over union), is to 1, the better the prediction is.

After that, you must also take into account that parts of the same object may be in multiple grid cells, resulting in multiple bounding boxes. This calls for non-max suppression.

In non-max suppression, bounding boxes are discarded from the grid cells that have a probability for the object being presented below a certain threshold, usually 0.5 or 0.6. Next, boxes with the highest prediction value are taken while discard or suppress the boxes which have an IoU of greater than another threshold with that box, usually 0.5 or 0.6.


With autonomous driving, so much things are still needed to be carried out. The technology has to perform object detection in due time to detect objects approaching quickly and avoid them. Also, there’s a need fora very low latency time for high accuracy. So if any of these things could be addressed directly without downtimes we are now one step ahead in a much safer future.

By Tuan Nguyen

2 thoughts to “Technology review – How do self-driving car see?”

Leave a Reply

Your email address will not be published. Required fields are marked *