1. 程式人生 > 實用技巧 >關於NLP多分類任務評價指標的總結

關於NLP多分類任務評價指標的總結

點選這裡檢視sklearn官方文件

sklearn.metrics模組實現了幾個損失、得分和效用函式來衡量分類效能

1、四個基本概念

TP、True Positive 真陽性:預測為正,實際為正

FP、False Positive 假陽性:預測為正,實際為負

FN、False Negative 假陰性:預測為負、實際為正

TN、True Negative 真陰性:預測為負、實際為負

2、二分類指標

3、多分類指標

....

F1-score:

是統計學中用來衡量二分類模型精確度的一種指標,用於測量不均衡資料的精度。它同時兼顧了分類模型的精確率和召回率。F1-score可以看作是模型精確率和召回率的一種加權平均,它的最大值是1,最小值是0。

在多分類問題中,如果要計算模型的F1-score,則有兩種計算方式,分別為micro-F1和macro-F1,這兩種計算方式在二分類中與F1-score的計算方式一樣,所以在二分類問題中,計算micro-F1=macro-F1=F1-score,micro-F1和macro-F1都是多分類F1-score的兩種計算方式;

micro-F1:

  • 計算方法:先計算所有類別的總的Precision和Recall,然後計算出來的F1值即為micro-F1;
  • 取值範圍:(0, 1);
  • 適用環境:在計算公式中考慮到了每個類別的數量,多分類不平衡,若資料極度不平衡會影響結果;

marco-F1:

  • 計算方法:
    將所有類別的Precision和Recall求平均,然後計算F1值作為macro-F1;
  • 取值範圍:(0, 1);
  • 適用環境:多分類問題,沒有考慮到資料的數量,所以會平等的看待每一類,不受資料不平衡影響,容易受到識別性高(高recall、高precision)的類別影響;
#指標測試
from sklearn import metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.metrics import f1_score
def Evaluate1(y_test,y_predic):
    
print('accuracy:', metrics.accuracy_score(y_test, y_predict)) #預測準確率輸出 print('macro_precision:',metrics.precision_score(y_test,y_predict,average='macro')) #預測巨集平均精確率輸出 print('micro_precision:', metrics.precision_score(y_test, y_predict, average='micro')) #預測微平均精確率輸出 # print('weighted_precision:', metrics.precision_score(y_test, y_predict, average='weighted')) #預測加權平均精確率輸出 print('macro_recall:',metrics.recall_score(y_test,y_predict,average='macro'))#預測巨集平均召回率輸出 print('micro_recall:',metrics.recall_score(y_test,y_predict,average='micro'))#預測微平均召回率輸出 # print('weighted_recall:',metrics.recall_score(y_test,y_predict,average='weighted'))#預測加權平均召回率輸出 print('macro_f1:',metrics.f1_score(y_test,y_predict,labels=[0,1,2,3,4,5,6],average='macro'))#預測巨集平均f1-score輸出 print('micro_f1:',metrics.f1_score(y_test,y_predict,labels=[0,1,2,3,4,5,6,7],average='micro'))#預測微平均f1-score輸出 # print('weighted_f1:',metrics.f1_score(y_test,y_predict,labels=[0,1,2,3,4,5,6],average='weighted'))#預測加權平均f1-score輸出 #target_names = ['class 1', 'class 2', 'class 3','class 4','class 5','class 6','class 7'] # print('混淆矩陣輸出:\n',metrics.confusion_matrix(y_test,y_predict,labels=[0,1,2,3,4,5,6]))#混淆矩陣輸出 #比如[1,3]為2,即1類預測為3類的個數為2 # print('分類報告:\n', metrics.classification_report(y_test, y_predict,labels=[0,1,2,3,4,5,6]))#分類報告輸出 ,target_names=target_names def Evaluate2(y_true,y_pred): print("accuracy:", accuracy_score(y_true, y_pred)) # Return the number of correctly classified samples print("macro_precision", precision_score(y_true, y_pred, average='macro')) print("micro_precision", precision_score(y_true, y_pred, average='micro')) # Calculate recall score print("macro_recall", recall_score(y_true, y_pred, average='macro')) print("micro_recall", recall_score(y_true, y_pred, average='micro')) # Calculate f1 score print("macro_f", f1_score(y_true, y_pred, average='macro')) print("micro_f", f1_score(y_true, y_pred, average='micro')) y_test = [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4,5,5,6,6,6,0,0,0,0] y_predict = [1, 1, 1, 3, 3, 2, 2, 3, 3, 3, 4, 3, 4, 3,5,1,3,6,6,1,1,0,6] Evaluate1(y_test,y_predict) Evaluate2(y_test,y_predict) ##其中列表左邊的一列為分類的標籤名,右邊support列為每個標籤的出現次數.avg / total行為各列的均值(support列為總和). ##precision recall f1-score三列分別為各個類別的精確度/召回率及 F1值 ''' accuracy: 0.5217391304347826 macro_precision: 0.7023809523809524 micro_precision: 0.5217391304347826 macro_recall: 0.5261904761904762 micro_recall: 0.5217391304347826 macro_f1: 0.5441558441558441 micro_f1: 0.5217391304347826 accuracy: 0.5217391304347826 macro_precision 0.7023809523809524 micro_precision 0.5217391304347826 macro_recall 0.5261904761904762 micro_recall 0.5217391304347826 macro_f 0.5441558441558441 micro_f 0.5217391304347826 '''


參考:

https://blog.csdn.net/lyb3b3b/article/details/84819931

https://blog.csdn.net/qq_43190189/article/details/105778058