Our Research on Localization Accuracy of Vehicle Detection Models (Part 1)

阿新 • • 發佈：2018-12-29

Editor’s note: This is the first in a series of three posts outlining the findings of research our in-house computer vision team conducted regarding the accuracy of popular open-source object detection models for detecting vehicles, as measured by pixel level accuracy.

Why we conducted this research

Autonomous driving technology requires accurate detection of traffic participants and objects in video images. In their most basic implementation, these detection systems will take an image of a traffic scene as their input, and provide the location and class label of traffic participants in the form of bounding boxes and estimated class probabilities.

In the past years, computer vision research in object detection has been dominated by deep learning which led to the release of numerous detection frameworks and network architectures to the public. We decided to take a closer look at the accuracy and robustness of some of the more popular deep learning models when applied to a vehicle detection task.

What we set out to learn

In conducting this research, we wanted to learn about:

The localization accuracy of some of the most popular off-the-shelf object detection systems.
How image degradation, such as converting to grayscale and adding different types of image noise, affects the performance of the models.

In this post, we explain the scope of our research, for which we generated a highly accurate ground truth dataset for vehicle detection which allowed us to compare the localization accuracy of five state-of-the-art deep learning models across a wide range of IoU values. In part two of this series, we explore our findings of that research. In part three, we explore our findings from when we evaluated the models’ localization accuracy based on the pixel deviation between the detections and the ground truth boxes, and what we discovered when we tested the models’ robustness against image noise and conversion to grayscale.

Getting started

The typical way to benchmark detection systems is on public datasets developed with automotive applications in mind. One of the earliest and most widely used datasets is Kitti, which contains 7,500 color images with a total of 80,000 annotated objects. Some more recent datasets include CityScape, Berkeley DeepDrive, and ApolloScape.

Kitti follows the Pascal VOC benchmarking protocol in which a true positive (TP) detection has to have the correct class label and an Intersection over Union (IoU) with a corresponding ground truth box that exceeds a fixed threshold. Class-specific precision-recall (PR) curves and average precision (AP) values are then computed based on the detector’s real-valued class probabilities. Since the PR curves are computed for relatively low IoU thresholds (0.5 and 0.7), they contain little information about the detector’s ability to localize objects with high accuracy. And, as we all know, high accuracy is often critical in automotive applications.

For our test set, we selected 219 color images of size 1920×1280 from a set of dash cam recordings of urban and highway scenarios in diverse weather and lighting conditions. The images contained a total of 847 vehicles, which we categorized into 730 passenger cars, 107 trucks, and 10 buses. Sample images from the test set follow.

An in-house team at Mighty AI generated the ground truth annotations. At least two experienced team members reviewed every annotation to ensure that each bounding box was within a single pixel of the object’s true boundary. We did not include images in the test set that contained vehicles with less than 20 pixels in width and height nor images containing vehicles with ambiguous boundaries due to motion blur and defocus.

Choosing the object detection models to evaluate

Mighty AI evaluated the following five off-the-shelf models for object detection:

Faster R-CNN NASNet, (TensorFlow Model Zoo: faster_rcnn_nas_coco, 2018_01_28)
Faster R-CNN ResNeXt 101, FPN (FAIR Detectron: X-101-64x4d-FPN, 2x)
Faster R-CNN ResNet 101, FPN (FAIR Detectron: R-101-FPN, 1x)
Mask R-CNN Inception ResNet V2 (TensorFlow Model Zoo: mask_rcnn_inception_resnet_v2_atrous_coco, 2018_01_28)
SSD, Mobilenet V1 (TensorFlow Model Zoo: ssd_mobilenet_v1_coco, 2017_11_17)

The first three systems are Faster R-CNN detectors with different backbones, specifically the NASNet architecture, ReNeXt, and ResNet in a feature pyramid network. The fourth system is Mask R-CNN with a ResNet V2 backbone, of which we only used the regressed bounding box coordinates. The fifth model, SSD Mobilenet V1, is designed to run in mobile applications.

The detectors were trained on the MS COCO dataset on 80 classes. The trained models can be found in TensorFlow’s Detection Model Zoo and in FAIR’s Detectron Model Zoo.

Table: Input resolution and inference times of the evaluated detection systems

Next up: Check out part two of this series to see what we discovered when we compared the localization accuracy of these five, state-of-the-art deep learning models across a wide range of IoU values.

Our Research on Localization Accuracy of Vehicle Detection Models (Part 1)

Editor’s note: This is the first in a series of three posts outlining the findings of research our in-house computer vision team conducted regarding the ac

Our Findings on Localization Accuracy of Vehicle Detection Models (Part 2)

Editor’s note: This is the second in a series of three posts outlining the findings of research our in-house computer vision team conducted regarding the a

What We Learned About Localization Accuracy of Vehicle Detection Models (Part 3)

Editor’s note: This is the third in a series of three posts outlining the findings of research our in-house computer vision team conducted regarding the ac

Expanding our research on breast cancer screening to Japan

Japanese version followsSix months ago, we joined a groundbreaking new research partnership led by the Cancer Research UK Imperial Centre at Imperial Colle

【論文筆記】YOLOv4: Optimal Speed and Accuracy of Object Detection

論文地址：https://arxiv.org/abs/2004.10934v1 github地址：https://github.com/AlexeyAB/darknet 摘要：有很多特徵可以提高卷積神經網路（CNN）的準確性。需要在大型資料集上對這些特徵的組合進行實際測試，並需要對結果進行理論

The Power of Voice: Amazon Alexa (Part 1)

While undertaking new concept development within the Anthemis Foundry, I couldn’t resist tapping into the possibilities of voice interfaces across industri

Selecting Subsets of Data in Pandas: Part 1

Selecting Subsets of Data in Pandas: Part 1This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed s

What can you do with 25 Racks of Servers in 2018? Part 1…

What can you do with 25 Racks of Servers in 2018? Part 1…My team and I, at Blockchain Productivity, integrate and deploy IT systems for bulk computing appl

Where do you want to jmp today ? In one of my previous posts (part 1 of writing stack based buffer

In the first 2 parts of the exploit writing tutorial series, I have discussed how a classic stack buffer overflow works and how you can b

Massive organism is crashing on our watch: First comprehensive assessment of Pando reveals critical threats

Rogers and McAvoy, in a PLOS ONE publication released 17 October, 2018, show Pando, Utah's massive, yet imperiled, aspen clone, is in grave need of forest

'Cloud computing' takes on new meaning for scientists: Researchers use machine learning to improve accuracy of climate predictio

Their work is detailed in a study published online recently by Proceedings of the National Academy of Sciences. "Clouds play a major role in the Earth's c

(OK) Research, implementation, and improvement of MPTCP on mobile smart devices

AbstractAs more and more mobile smart devices (MSDs) are equipped with multiple wireless network interfaces (e.g. WiFi, cellular, etc.), M

SUMO使用：Vehicle '**' is not allowed to depart on any lane of its first edge.

1、錯誤提示 Simulation started with time: 0.00 Error: Vehicle '15' is not allowed todepart on any lane of its first edge. Quitting (on error)

Cannot switch on a value of type String for source level below 1.7. Only convertible int values or enum variables are permitted

perm eve mit can source string per ted idt 在java中寫switch代碼時，參數用的是string，jdk用的是1.8，但是還是報錯，說不支持1.7版本以下的，然後查找了項目中的一些文件，打開一個文件如下，發現是1.6的版本，好奇

Our Research on Localization Accuracy of Vehicle Detection Models (Part 1)

Our Research on Localization Accuracy of Vehicle Detection Models (Part 1)

Our Findings on Localization Accuracy of Vehicle Detection Models (Part 2)

What We Learned About Localization Accuracy of Vehicle Detection Models (Part 3)

Expanding our research on breast cancer screening to Japan

【論文筆記】YOLOv4: Optimal Speed and Accuracy of Object Detection

The Power of Voice: Amazon Alexa (Part 1)

Selecting Subsets of Data in Pandas: Part 1

What can you do with 25 Racks of Servers in 2018? Part 1…

Where do you want to jmp today ? In one of my previous posts (part 1 of writing stack based buffer

Massive organism is crashing on our watch: First comprehensive assessment of Pando reveals critical threats

'Cloud computing' takes on new meaning for scientists: Researchers use machine learning to improve accuracy of climate predictio

(OK) Research, implementation, and improvement of MPTCP on mobile smart devices

SUMO使用：Vehicle '**' is not allowed to depart on any lane of its first edge.

Cannot switch on a value of type String for source level below 1.7. Only convertible int values or enum variables are permitted

Improving the accuracy of roundness measurement

A Survey on the Security of Stateful SDN Data Planes

從零開始的無人駕駛 02：Vehicle Detection

On the Performance of MIMO-NOMA-Based Visible Light Communication Systems

資料視覺化之"Research on visualization techniques in data mining"

「Transfer Learning」Note on Strategic Curriculum of Proxy Labels

Our Research on Localization Accuracy of Vehicle Detection Models (Part 1)

相關推薦