Our Findings on Localization Accuracy of Vehicle Detection Models (Part 2)

阿新 • • 發佈：2018-12-29

Editor’s note: This is the second in a series of three posts outlining the findings of research our in-house computer vision team conducted regarding the accuracy of popular open-source object detection models for detecting vehicles, as measured by pixel level accuracy. Before diving in, be sure to check out

part 1 to understand the scope of our experiment.

Model outputs

Let’s dive into what we learned. We tested the five detectors on our test dataset, limiting the class list to car, truck, and bus. We also compared the model outputs to crowd-sourced annotations that Mighty AI’s platform generated using a standard workflow for box annotations.

Following are examples of the model outputs (red) and the ground truth (green) for Faster R-CNN NAS, Mask R-CNN ResNet V2, and SSD Mobilenet.

Faster R-CNN NAS

Mask R-CNN ResNet V2

SSD Mobilenet V1

Measuring localization accuracy

We measured the detectors’ localization accuracy by the Intersection over Union (IoU)—also referred to as Jaccard index—and the pixel deviation or pixel error. The pixel deviation is defined as the maximum deviation in x- and y-direction between the detected box and the ground truth box.

Figure: Pixel deviation between ground truth box (pink) and detected box (gray) is defined as the maximum of delta_x and delta_y.

For a given detection to be counted as a true positive (TP), its IoU with an identically labeled ground truth box has to exceed a given threshold. When using pixel deviation instead of IoU, a TP has to have a deviation from the ground truth box that falls below a given threshold. Any detection that does not fulfill these requirements is counted as a false positive (FP).

Precision-recall curves

In our first experiment, we computed the micro-averaged precision-recall (PR) curves for the five models at IoU thresholds of 0.5 and 0.7, the same values used in the Kitti benchmark protocol.

For comparison, we provide the precision and recall for Mighty AI’s global community of human annotators for an IoU threshold of 0.5. Since the human annotations do not include a real-valued estimate of the class probability, they will generate only a single point on the PR chart.

Figure: PR curves for IoU thresholds of 0.5 (solid) and 0.7 (dashed)

As you can see, Mask R-CNN performs slightly better than Faster R-CNN NAS, followed by Faster R-CNN ResNet and ResNeXt. SSD falls short of any of the other models.

For an IoU threshold of 0.5 at 90% precision, Mask R-CNN and Faster R-CNN NAS reached 75% recall, Faster R-CNN ResNeXt reached 65% recall, and ResNet reached 70% recall. For reference, Mighty AI’s global community of human annotators achieved 95% precision at 92% recall.

When we increased the IoU threshold to 0.7, we noticed a significant decrease in the performance across all systems. At 90% precision, Mask R-CNN and Faster R-CNN NAS dropped to 63% and 60% recall, respectively. Faster R-CNN ResNeXt and ResNet dropped to 45% and 50% recall, respectively.

We then used pixel deviation instead of the IoU to compute the micro-averaged PR curves.

Figure: PR curves for max pixel deviation thresholds of 25 (solid) and 10 (dashed)

The ranking of the models was similar to the ranking based on IoU thresholds, but the gap between the top four models narrowed. At a threshold of 25 pixels and 90% precision, Mask R-CNN and Faster R-CNN NAS reached 70% recall. For a threshold of 10 pixels, there was a significant 10-15% drop in the precision of all models across large parts of the curves.

Localization accuracy

To get a better understanding of the detectors’ localization accuracy, we computed the cumulative histogram of the IoU values of the TPs at an IoU threshold of 0.5 and a minimum class probability of 0.5.

Mask R-CNN and Faster R-CNN NAS performed the best. For Mask R-CNN, 50% of the TPs had an IoU > 0.9, 80% had an IoU > 0.8, and 90% had an IoU > 0.7. The human annotations from Mighty AI’s global community of annotators resulted in 86% of TPs with an IoU > 0.9, 98% had an IoU > 0.8, and 99% had an IoU > 0.7.

Figure: Cumulative histogram of number of TPs over IoU

We’ve noted the cumulative distribution of the pixel deviation for the TPs in the following graph. Faster R-CNN NAS performed best with 25% of TPs having a deviation of < 3 pixels, 50% having a deviation of < 5 pixels, and 90% having a deviation of < 13 pixels. The human annotations had 24% of TPs within 1 pixel deviation, 73% within 3 pixels, 86% within 5 pixels, and 94% within 10 pixels.

Figure: Cumulative histogram of number of TPs over pixel deviation

What we learned

Among the five models tested, Mask R-CNN ResNet V2 and Faster R-CNN NAS performed best across all experiments, but neither achieved the quality levels of Mighty AI’s global community of human annotators.

We did an in-depth evaluation of the localization accuracy of the models using cumulative histograms computed across two types of accuracy measures: the IoU and the pixel deviation. Faster R-CNN NAS performed best with 50% of its detections having an IoU of above 0.9. For the pixel deviation measure, Faster R-CNN NAS hit the 50% mark at 5 pixels.

Next up: Check out part three of this series to see what we found when we evaluated the models’ localization accuracy based on the pixel deviation between the detections and the ground truth boxes, and what we discovered when we tested the models’ robustness against image noise and conversion to grayscale.

Our Findings on Localization Accuracy of Vehicle Detection Models (Part 2)

Editor’s note: This is the second in a series of three posts outlining the findings of research our in-house computer vision team conducted regarding the a

Our Research on Localization Accuracy of Vehicle Detection Models (Part 1)

Editor’s note: This is the first in a series of three posts outlining the findings of research our in-house computer vision team conducted regarding the ac

What We Learned About Localization Accuracy of Vehicle Detection Models (Part 3)

Editor’s note: This is the third in a series of three posts outlining the findings of research our in-house computer vision team conducted regarding the ac

【論文筆記】YOLOv4: Optimal Speed and Accuracy of Object Detection

論文地址：https://arxiv.org/abs/2004.10934v1 github地址：https://github.com/AlexeyAB/darknet 摘要：有很多特徵可以提高卷積神經網路（CNN）的準確性。需要在大型資料集上對這些特徵的組合進行實際測試，並需要對結果進行理論

Massive organism is crashing on our watch: First comprehensive assessment of Pando reveals critical threats

Rogers and McAvoy, in a PLOS ONE publication released 17 October, 2018, show Pando, Utah's massive, yet imperiled, aspen clone, is in grave need of forest

'Cloud computing' takes on new meaning for scientists: Researchers use machine learning to improve accuracy of climate predictio

Their work is detailed in a study published online recently by Proceedings of the National Academy of Sciences. "Clouds play a major role in the Earth's c

SUMO使用：Vehicle '**' is not allowed to depart on any lane of its first edge.

1、錯誤提示 Simulation started with time: 0.00 Error: Vehicle '15' is not allowed todepart on any lane of its first edge. Quitting (on error)

Cannot switch on a value of type String for source level below 1.7. Only convertible int values or enum variables are permitted

perm eve mit can source string per ted idt 在java中寫switch代碼時，參數用的是string，jdk用的是1.8，但是還是報錯，說不支持1.7版本以下的，然後查找了項目中的一些文件，打開一個文件如下，發現是1.6的版本，好奇

Our Findings on Localization Accuracy of Vehicle Detection Models (Part 2)

Our Findings on Localization Accuracy of Vehicle Detection Models (Part 2)

Our Research on Localization Accuracy of Vehicle Detection Models (Part 1)

What We Learned About Localization Accuracy of Vehicle Detection Models (Part 3)

【論文筆記】YOLOv4: Optimal Speed and Accuracy of Object Detection

Massive organism is crashing on our watch: First comprehensive assessment of Pando reveals critical threats

'Cloud computing' takes on new meaning for scientists: Researchers use machine learning to improve accuracy of climate predictio

SUMO使用：Vehicle '**' is not allowed to depart on any lane of its first edge.

Cannot switch on a value of type String for source level below 1.7. Only convertible int values or enum variables are permitted

Improving the accuracy of roundness measurement

A Survey on the Security of Stateful SDN Data Planes

從零開始的無人駕駛 02：Vehicle Detection

On the Performance of MIMO-NOMA-Based Visible Light Communication Systems

「Transfer Learning」Note on Strategic Curriculum of Proxy Labels

Thoughts on the Application of Radar Technology to the Improvement of Street Light System

On the Convergence of the TTL Approximation for an LRU Cache under Independent Stationary Request Pr

閱讀筆記之No-Reference Objective Image Sharpness Metric Based on the Notion of Just Noticeable Blur

「Computer Vision」Note on Faster Training of Mask R-CNN

Physicists Present a New Theory on the Origin of D remote connectivity ark Matter

「Deep Learning」Note on Benchmark Analysis of DNNs

Cannot switch on a value of type String for source level below 1.7. Only

Our Findings on Localization Accuracy of Vehicle Detection Models (Part 2)

相關推薦