【目標檢測】目標檢測演算法評估指標(效能度量) AP，mAP 詳細介紹

阿新 • • 發佈：2020-10-10

參考論文：《A Survey on Performance Metrics for Object-Detection Algorithms》

對應Github：https://github.com/rafaelpadilla/Object-Detection-Metrics

如何評估(evaluate)目標檢測演算法的表現(performance)？

目標檢測演算法的評估和分類演算法的評估有所不同，在目標檢測任務中，我們即需要 識別出正確的目標類別，又需要 定位出準確的目標位置。

評估目標檢測演算法效能 最常用的指標 是 AP (average precision，針對單類別) 和 mAP (mean AP，針對多類別)。

AP

1、重要的基礎概念（前置知識）

IOU (Intersection Over Union, 交併比)

IOU 是評估 兩個 bounding box 的接近程度 的度量（同時考慮了大小和位置），等於 交集的面積 / 並集的面積，範圍為 [0, 1]。

在這裡插入圖片描述
IOU 越大，說明兩個 bounding box 的接近程度越高，1 代表完全重合，0 代表不相交。

在這裡插入圖片描述

True Positive, False Positive, False Negative and True Negative

True positive (TP): A correct detection of a ground-truth bounding (對 gt 的正確檢測). Detection with IOU ≥ threshold
False positive (FP): A wrong detection【An incorrect detection of a nonexistent object (對不存在物件的錯誤檢測), or a misplaced detection of an existing object (對存在物件的位置錯誤的檢測)】.Detection with IOU < threshold
False negative (FN): A ground truth not detected (沒有被檢測到的 gt).

在目標檢測中，我們沒有使用 true negative (TN)，因為 有無窮多個 bounding boxes that should not be detected within any given image

其中，threshold 取決於具體任務的指標，通常取 50%, 75% 或是 95%。

Precision, Recall

因為在目標檢測中沒有使用 TN，所以我們無法使用任何基於 TN 的度量，比如 TPR, FPR 和 ROC 曲線。作為代替，目標檢測演算法的評估主要是基於 Precision(準確率) 和 Recall(召回率)。

在這裡插入圖片描述

Precision：等於 positive predictions (所有 detection) 中 correct positive predictions (預測正確的 detection) 佔的比例，反映了模型 identify only relevant objects 的能力。
Recall：等於 correct positive predictions (預測正確的 detection) 佔 all given ground truths (所有 gt)，反映了模型 find all relevant cases (all ground-truth bounding boxes) 的能力。

一個完美，理想的目標檢測器應該 find all ground-truth objects (FN = 0，即 high recall)，同時 identifying only relevant objects (FP = 0，即 high precision)。

2、P-R curve (Precision × Recall curve)

Precision 和 Recall 是一對矛盾的度量。具體地，

當檢測器的置信度閾值上升時，detection，包括正確的detection(TP) 和錯誤的detection(FP) 都會減少，Precision 會震盪上升，同時，未被檢出的ground truth(FN) 會增多，Recall 會下降。
相反也同理

我們可以使用 Precision x Recall curve 來評估目標檢測器 在不同的置信度閾值下 Precision 和 Recall 之間的 權衡(trade-off) 的情況。

An object detector of a particular class is considered good if its precision stays high as recall increases, which means that if you vary the confidence threshold, the precision and recall will still be high.
A poor object detector needs to increase the number of detected objects (increasing False Positives = lower precision) in order to retrieve all ground truth objects (high recall). That’s why the Precision x Recall curve usually starts with high precision values, decreasing as recall increases.

3、AP (Average Precision)

A high AP (area under the curve (AUC) of the Precision x Recall curve) 可以表明 detector has both high precision and high recall。

但是，P-R 曲線往往為上下波動的鋸齒狀，這給準確測量曲線的 AUC 帶來了挑戰。

在估計(estimate) AUC 之前，我們需要對 P-R 曲線進行處理，以消除鋸齒現象。

一般有兩種處理方式：11點插值法(11-point interpolation) 和 全點插值法(all-point interpolation)。

11點插值法

11點插值法，是通過 averaging the maximum precision values at a set of 11 equally spaced recall levels [0, 0.1, 0.2, … , 1] 【對一系列等間隔的recall level下的最大的precision值求平均】來近似 P-R 曲線。寫作數學公式，

在這裡插入圖片描述
其中，

在這裡插入圖片描述
Instead of using the precision observed at each point, the AP is obtained by interpolating the precision only at the 11 levels R R R taking the maximum precision whose recall value is greater than R R R.

全點插值法

全點插值法，是通過以下方式對所有點進行插值。

在這裡插入圖片描述
其中，

在這裡插入圖片描述
In this case, instead of using the precision observed at only few points, the AP is now obtained by interpolating the precision at each level R R R, taking the maximum precision whose recall value is greater or equal than R n + 1 R_{n+1} Rn+1.

計算例項

舉個例子來幫助理解。

如下圖所示，7幅影象上有15個ground-truth框【用綠色方框表示】，模型給出了24個detections【用紅色方框表示，使用字母 (A,B,…,Y) 進行編號】，每個detection有一個置信度。

在這裡插入圖片描述
在本例中，我們設定 IOU 的閾值為 30%，即，如果 detection 和某個 ground truth 的 IOU 大於等於30%，則判斷為正確的 (TP)，否則為錯誤的 (FP)。

另外，對於 單個 ground truth，檢測器可能會預測出 多個重複的 detection (如圖2中的D和E；圖3中的G、H和I)。這種情況下，我們將 置信度最高的 detection 判斷為 TP，其餘的判斷為 FP。

各 detection 的判斷如下表所示。

在這裡插入圖片描述
為了計算繪製 TP or FP detections 的 Precision x Recall curve，我們首先需要將 detections 按照置信度的大小進行排序，然後根據累計的 TP or FP detections 計算對應的 Precision 和 Recall，如下表所示

在這裡插入圖片描述
其中，Acc TP 和 Acc FP 兩列是對對應置信度以上的所有 TP 和 FP detections 的累計。

[工作中]

Plotting the precision and recall values we have the following Precision x Recall curve:

在這裡插入圖片描述
As mentioned before, there are two different ways to measure the interpolted average precision: 11-point interpolation and interpolating all points.

計算11點插值法的AP

The idea of the 11-point interpolated average precision is to average the precisions at a set of 11 recall levels (0,0.1,…,1). The interpolated precision values are obtained by taking the maximum precision whose recall value is greater than its current recall value as follows:

在這裡插入圖片描述
we have:

在這裡插入圖片描述
計算全點插值法的AP

By interpolating all points, the Average Precision (AP) can be interpreted as an approximated AUC of the Precision x Recall curve. The intention is to reduce the impact of the wiggles in the curve. By applying the equations presented before, we can obtain the areas as it will be demostrated here. We could also visually have the interpolated precision points by looking at the recalls starting from the highest (0.4666) to 0 (looking at the plot from right to left) and, as we decrease the recall, we collect the precision values that are the highest as shown in the image below:

在這裡插入圖片描述

Calculating the total area, we have the AP:

在這裡插入圖片描述
The results between the two different interpolation methods are a little different: 24.56% and 26.84% by the every point interpolation and the 11-point interpolation respectively.

mAP (mean AP)

mAP 是度量目標檢測器在所有類別上的精度的指標。

其實 mAP 就是簡單地 對所有類別上的AP 求平均。

在這裡插入圖片描述
其中 AP_i 表示第 i 類的 AP，N 表示類別數目。

[其他解讀]：

詳解object detection中的mAP

【目標檢測】目標檢測演算法評估指標(效能度量) AP，mAP 詳細介紹

AP