模型效能指標

阿新 • • 發佈：2020-12-29

模型效能指標

作者：elfin 資料來源：mocro wen

1、前言--混淆矩陣
2、精確率Precision
3、召回率Recall
4、AP(Average precision) 平均精確度
- 4.1 分類場景下的 AP
- 4.2 Mask-RCNN的AP值
5、mAP值

1、前言--混淆矩陣

混淆矩陣主要預測-實際之間的混淆程度，並通過各種指標對這些結果的優劣程度進行度量。

1.1 二分類的混淆矩陣

表1 二分類混淆矩陣

這裡需要注意：

混淆矩陣的橫座標是預測值，縱座標是真實值；
與笛卡爾座標系相比，縱座標保持0_{1、橫座標是1}0；
在混淆矩陣中我們主要關注主對角線上的元素要儘可能大，

次對角線儘可能為0，即希望混淆矩陣是主對角矩陣；
混淆矩陣的元素TP、TN、FP、FN都是從左到右的讀法，

第一個字母表示真假，第二個表示預測的情況

；

具體含義見下面的列表。

混淆矩陣的符號含義：

TP(True Positive)：將正類預測為正類，即真實為1、預測也為1；
FN(False Negative)：將正例預測為負例，即真實為1，預測為0；
FP(False Positive)：將負例預測為正例，即真實為0，預測為1；
TN(True Negative)：將負例預測為負例，即真實為0，預測也為0.

Positive：積極的，這裡表示陽性；

Negative：悲觀的，這裡表示陰性。

1.2 多分類的混淆矩陣

多分類的混淆矩陣：

表2 多分類的混淆矩陣

這裡的定義規則與二分類保持一致，兩兩之間或者x類別與非x類之間可以參考二分類繪製。

x類別與非x類之間的二分混淆矩陣：

表3 多類別標籤某類別的二分混淆矩陣

1.3 python繪製混淆矩陣

使用python生成混淆矩陣

import random
from sklearn.metrics import confusion_matrix

labels = ["dog", "cat"]
y_true = [labels[round(random.random())] for _ in range(10)]
y_pred = [labels[round(random.random())] for _ in range(10)]
confusion_matrix1 = confusion_matrix(y_true=y_true,
                                     y_pred=y_pred,
                                     labels=labels)
print(f"y_true: {y_true}")
print(f"y_pred: {y_pred}")
print(confusion_matrix1)

輸出：

y_true: ['cat', 'cat', 'cat', 'cat', 'cat', 'cat', 'dog', 'dog', 'dog', 'dog']
y_pred: ['dog', 'cat', 'dog', 'cat', 'cat', 'dog', 'dog', 'cat', 'cat', 'cat']
[[1 3]
 [3 3]]

繪製混淆矩陣

這裡自定義了一個class，封裝了confusion_matrix與seaborn的熱力圖。效果圖如下：

圖1 混淆矩陣

專案連結

Top -- Bottom

2、精確率Precision

精確率Precision：精確率又稱查準率，指預測的正例結果中有多少是正確的！

\(P=\frac{TP}{TP+FP}\)

3、召回率Recall

召回率Recall：召回率又稱查全率，指真實的正樣本中有多少被正確查詢到了！

\(R=\frac{TP}{TP+FN}\)

Top -- Bottom

4、AP(Average precision) 平均精確度

在查閱資料時，遇到將AP、mAP混為一談的，為了加以區分，我們從其本質介紹，不管是在哪個資料集(COCO)上的評測。

首先是Average precision，從字面理解我們會得到很多精確度，再對其求平均。此時有兩種情況：

Precision是某個指標的連續函式(非嚴格定義，可以理解為precision的值非離散)，那麼此時AP即為Precision的積分；
Precision是某個指標的離散函式，即precision的值離散，此時AP即為“期望”(要注意實際上並不是求分佈的期望).

為了方便理解，此處以BBox迴歸為例進行說明！

在目標檢測任務中，往往需要對物體的bbox進行迴歸。這裡我們需要使用Recall召回率、IOU值，前者在上面已經說明，而IOU值是目標檢測中常用的指標。IOU值即為兩個BBox的交與並的比值。在目標識別的場景中，我們可以分為以下兩種情況：

只關注BBox是否正確錨定例項；
顯著性水平是否達標，即得分score是否超過閾值.

本節我們只關注AP指標如何求，下面分別從二分類和MaskRCNN的角度講解。

4.1 分類場景下的 AP

可參考資源AP和mAP的詳解

4.1.1 計算預測得分及其真實標籤展示

4.1.2 根據得分的高低按降序排列

4.1.3 Precision列表和Recall列表

Precisions = [1, 1 , 0.66666667, 0.5  , 0.6, 0.66666667 , 0.71428571,
              0.75 , 0.77777778 , 0.8 , 0.72727273, 0.66666666 , 
              0.61538461 , 0.57142857, 0.53333333]
Recalls = [0.06666667, 0.13333334, 0.13333334, 0.13333334, 0.2,
           0.26666668, 0.33333334, 0.4       , 0.46666667, 0.53333336,
           0.53333336, 0.53333336, 0.53333336, 0.53333336, 0.53333336]

4.1.4 計算AP

step1: 尋找召回率階躍的點的索引indices

indices=[1, 4, 5, 6, 7, 8, 9]

step2_0：計算平均精度

\[AP = \sum_{i\in indices}^{}\left (Recalls[i]-Recalls[i-1] \right )\cdot Precisions[i] \]

若將Recalls看做橫座標、Precisions看做縱座標，兩者滿足函式關係，則\(AP\)即可近似看為其期望！

step2_1：另一種常見的計算方法：

在每個召回率中取最大的精度求平均

                    ![](https://img2020.cnblogs.com/blog/1319275/202012/1319275-20201229180809997-1749707640.png)

上表中的綠色部分代表被我們選出來用於平均的值，所以：

AP=(1+1+0.66666667+0.5+0.6+0.66666667+0.71428571+0.75+0.77777778+0.8)/8=0.93442460375

Top -- Bottom

4.2 Mask-RCNN的AP值

MaskRCNN在計算AP值時，使用的標記的mAP!

4.2.1 Mask-RCNN的AP值計算部分

	# Get matches and overlaps
    gt_match, pred_match, overlaps = compute_matches(
        gt_boxes, gt_class_ids, gt_masks,
        pred_boxes, pred_class_ids, pred_scores, pred_masks,
        iou_threshold)

    # Compute precision and recall at each prediction box step
    precisions = np.cumsum(pred_match > -1) / (np.arange(len(pred_match)) + 1)
    recalls = np.cumsum(pred_match > -1).astype(np.float32) / len(gt_match)

    # Pad with start and end values to simplify the math
    precisions = np.concatenate([[0], precisions, [0]])
    recalls = np.concatenate([[0], recalls, [1]])

    # Ensure precision values decrease but don't increase. This way, the
    # precision value at each recall threshold is the maximum it can be
    # for all following recall thresholds, as specified by the VOC paper.
    for i in range(len(precisions) - 2, -1, -1):
        precisions[i] = np.maximum(precisions[i], precisions[i + 1])

    # Compute mean AP over recall range
    indices = np.where(recalls[:-1] != recalls[1:])[0] + 1
    mAP = np.sum((recalls[indices] - recalls[indices - 1]) *
                 precisions[indices])

compute_matches函式先對預測的得分進行排序，根據得分從高到低對預測的pred_boxes、pred_class_ids、pred_masks進行一一對映(重新排序)；使用compute_overlaps_masks函式對mask矩陣進行兩兩間的IOU值，其引數為：masks1, masks2: [Height, Width, instances]；所以再根據IOU值(overlaps:每個元素代表匹配到的例項集合)，對每個overlaps元素進行排序(倒序：從大到小)，再刪除得分低於閾值的元素索引，在未刪除的索引中進行預測和真實的匹配！

4.2.2 按得分從大到小排序

pred_boxes、pred_class_ids、pred_masks根據score排序後的索引更新，即他們依據score的順序相應變化；

所以，此時Mask矩陣在通道維度上是有順序的！

4.2.3 計算Pred_Masks與True_Masks之間的Iou值

預測的每一個mask與真實的每一個mask，兩兩計算Iou值，可以得到依據矩陣：

如圖所示，n個預測的mask與m個真實的mask之間進行比較，會得到一個Iou值矩陣，將其命名為overlaps，shape=(n,m)。

# 先將預測與真實的Mask陣列平鋪成 shape=(h*w, num_mask)，注意平鋪後的矩陣每一列就是一個Mask的所有元素，而masks1 > .5是用於生成True、False矩陣
masks1 = np.reshape(masks1 > .5, (-1, masks1.shape[-1])).astype(np.float32)
masks2 = np.reshape(masks2 > .5, (-1, masks2.shape[-1])).astype(np.float32)
area1 = np.sum(masks1, axis=0)
area2 = np.sum(masks2, axis=0)

# intersections and union
# 此時的masks1.T的shape為 (n, h*w); masks2的shape為(h*w, m)
intersections = np.dot(masks1.T, masks2)
union = area1[:, None] + area2[None, :] - intersections
overlaps = intersections / union

注意：每個例項的mask，背景為黑色，例項為白色，歸一化後，例項部分應該是大於0.5的，所以求和後即為Mask區域的面積；相乘之後即為兩個mask例項的並的面積！所以overlaps是一個Iou值矩陣，shape為(n,m)。

4.2.4 生成pred_match、gt_match

根據預測的bbox、真實的bbox的個數分別生成全為\(-1\)的pred_match、gt_match一維陣列！

對預測維度n進行迴圈，每次取overlaps[i]，即一個預測mask與所有真實的Mask之間的Iou值陣列，將其值從大到小進行排序得到sorted_ixs，按Iou值排序後的順序取值與閾值進行比較，得到不滿足的sorted_ixs索引low_score_idx！

>>> a = np.array([[0.12, 0.53, 0.64, 0.31, 0.89, 0.45, 0.99, 0.06],[0.12, 0.53, 0.64, 0.31, 0.89, 0.45, 0.99, 0.06]])
>>> a
Out[19]: 
array([[0.12, 0.53, 0.64, 0.31, 0.89, 0.45, 0.99, 0.06],
       [0.12, 0.53, 0.64, 0.31, 0.89, 0.45, 0.99, 0.06]])
>>> sorted_ixs = np.argsort(a[1])[::-1]
>>> sorted_ixs
Out[21]: array([6, 4, 2, 1, 5, 3, 0, 7], dtype=int64)
>>> low_score_idx = np.where(a[1, sorted_ixs] < 0.5)[0]
>>> low_score_idx
Out[23]: array([4, 5, 6, 7], dtype=int64)
# 將不滿足閾值的sorted_ixs元素去除
>>> if low_score_idx.size > 0:
        sorted_ixs = sorted_ixs[:low_score_idx[0]]
>>> sorted_ixs
Out[25]: array([6, 4, 2, 1], dtype=int64)

# 3. Find the match
for j in sorted_ixs:
    # sorted_ixs的元素j標識的是真實的第j個mask，如果這個mask已經匹配了，那麼不再進行匹配！
    if gt_match[j] > -1:
        continue
    # If we reach IoU smaller than the threshold, end the loop
    iou = overlaps[i, j]
    if iou < iou_threshold:
        break
    # 當pred_bbox[i]與gt_bbox[i]都未匹配，且IOU值大於閾值，恰好預測類別與真實類別一樣，則匹配成功，匹配數加1。
    if pred_class_ids[i] == gt_class_ids[j]:
        match_count += 1
        gt_match[j] = i
        pred_match[i] = j
        break

圖2 GT_MATCH、PRED_MATCH

基於上面的程式碼段和邏輯，我們可以樹立出如下資訊：

gt_match：元素是真實的bbox匹配的預測框索引(這裡是大於-1的)，若沒有匹配到即為\(-1\).

pred_match: 元素是預測的bbox匹配到的真實框的索引(這裡是大於-1的)，若沒有匹配到即為\(-1\).

4.2.5Precision 和 Recall

# 計算這一步中的所有預測框預測準確的累積概率，np.cumsum是求累積和;
# np.cumsum(pred_match > -1)是預測正確的累積頻數.
precisions = np.cumsum(pred_match > -1) / (np.arange(len(pred_match)) + 1)
# 計算真實的框有多少個被正確地預測了的累積概率
recalls = np.cumsum(pred_match > -1).astype(np.float32) / len(gt_match)

# Pad with start and end values to simplify the math
precisions = np.concatenate([[0], precisions, [0]])
recalls = np.concatenate([[0], recalls, [1]])

# 確保陣列是單調遞減的，為什麼只有精確度需要修正？注意召回率是沒有np.arange的！
for i in range(len(precisions) - 2, -1, -1):
    precisions[i] = np.maximum(precisions[i], precisions[i + 1])

4.2.6 AP

# 計算召回率函式裡發生階躍的值的索引
indices = np.where(recalls[:-1] != recalls[1:])[0] + 1
# 根據召回率發生階躍的索引獲取 召回率的改變數*精度；注意這種求法和求積分的形式是類似的！
mAP = np.sum((recalls[indices] - recalls[indices - 1]) *
             precisions[indices])

Top -- Bottom

5、mAP值

在第四章的基礎上，如果在求解的計數過程中，預測和真實的對照是基於特定的某一類，求解出來的AP值即為當前類的平均精確度，將所有類別的AP求和再取平均就得到了mAP！

\[mAP = \frac{\sum_{c\in category}^{}AP_{c}}{\left | category\right |} \]

Top -- Bottom

模型效能指標

模型效能指標

1、前言--混淆矩陣

1.1 二分類的混淆矩陣

1.2 多分類的混淆矩陣

1.3 python繪製混淆矩陣

2、精確率Precision

3、召回率Recall

4、AP(Average precision) 平均精確度

4.1 分類場景下的 AP

4.1.1 計算預測得分及其真實標籤展示

4.1.2 根據得分的高低按降序排列

4.1.3 Precision列表和Recall列表

4.1.4 計算AP

4.2 Mask-RCNN的AP值

4.2.1 Mask-RCNN的AP值計算部分

4.2.2 按得分從大到小排序

4.2.3 計算Pred_Masks與True_Masks之間的Iou值

4.2.4 生成pred_match、gt_match

4.2.5Precision 和 Recall

4.2.6 AP

5、mAP值

相關推薦