sklearn評估模型的方法
阿新 • • 發佈:2018-12-07
一、acc、recall、F1、混淆矩陣、分類綜合報告
1、準確率
第一種方式:accuracy_score
# 準確率import numpy as np
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3,9,9,8,5,8]
y_true = [0, 1, 2, 3,2,6,3,5,9]
accuracy_score(y_true, y_pred)
Out[127]: 0.33333333333333331
accuracy_score(y_true, y_pred, normalize=False ) # 類似海明距離,每個類別求準確後,再求微平均
Out[128]: 3
第二種方式:metrics
巨集平均比微平均更合理,但也不是說微平均一無是處,具體使用哪種評測機制,還是要取決於資料集中樣本分佈
巨集平均(Macro-averaging),是先對每一個類統計指標值,然後在對所有類求算術平均值。
微平均(Micro-averaging),是對資料集中的每一個例項不分類別進行統計建立全域性混淆矩陣,然後計算相應指標。(來源:談談評價指標中的巨集平均和微平均)
from sklearn import metrics
metrics.precision_score(y_true, y_pred, average ='micro') # 微平均,精確率
Out[130]: 0.33333333333333331
metrics.precision_score(y_true, y_pred, average='macro') # 巨集平均,精確率
Out[131]: 0.375
metrics.precision_score(y_true, y_pred, labels=[0, 1, 2, 3], average='macro') # 指定特定分類標籤的精確率
Out[133]: 0.5
其中average引數有五種:(None, ‘micro’, ‘macro’, ‘weighted’, ‘samples’)
.
2、召回率
metrics.recall_score(y_true, y_pred, average='micro')
Out[134]: 0.33333333333333331
metrics.recall_score(y_true, y_pred, average='macro')
Out[135]: 0.3125
.
3、F1
metrics.f1_score(y_true, y_pred, average='weighted')
Out[136]: 0.37037037037037035
.
4、混淆矩陣
# 混淆矩陣
from sklearn.metrics import confusion_matrix
confusion_matrix(y_true, y_pred)
Out[137]:
array([[1, 0, 0, ..., 0, 0, 0],
[0, 0, 1, ..., 0, 0, 0],
[0, 1, 0, ..., 0, 0, 1],
...,
[0, 0, 0, ..., 0, 0, 1],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 1, 0]])
橫為true label 豎為predict
.
5、 分類報告
# 分類報告:precision/recall/fi-score/均值/分類個數from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 2, 0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
其中的結果:
precision recall f1-score support
class 0 0.67 1.00 0.80 2class 1 0.00 0.00 0.00 1class 2 1.00 1.00 1.00 2
avg / total 0.670.800.725
包含:precision/recall/fi-score/均值/分類個數
.
6、 kappa score
kappa score是一個介於(-1, 1)之間的數. score>0.8意味著好的分類;0或更低意味著不好(實際是隨機標籤)
from sklearn.metrics import cohen_kappa_score
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
cohen_kappa_score(y_true, y_pred)
.
二、ROC
1、計算ROC值
import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
roc_auc_score(y_true, y_scores)
2、ROC曲線
y = np.array([1, 1, 2, 2])
scores = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = roc_curve(y, scores, pos_label=2)
來看一個官網例子,貼部分程式碼,全部的code見:Receiver Operating Characteristic (ROC)
import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle
from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from scipy import interp
# Import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
# 畫圖
all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))
# Then interpolate all ROC curves at this points
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
mean_tpr += interp(all_fpr, fpr[i], tpr[i])
# Finally average it and compute AUC
mean_tpr /= n_classes
fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])
# Plot all ROC curves
plt.figure()
plt.plot(fpr["micro"], tpr["micro"],
label='micro-average ROC curve (area = {0:0.2f})'''.format(roc_auc["micro"]),
color='deeppink', linestyle=':', linewidth=4)
plt.plot(fpr["macro"], tpr["macro"],
label='macro-average ROC curve (area = {0:0.2f})'''.format(roc_auc["macro"]),
color='navy', linestyle=':', linewidth=4)
colors = cycle(['aqua', 'darkorange', 'cornflowerblue'])
for i, color in zip(range(n_classes), colors):
plt.plot(fpr[i], tpr[i], color=color, lw=lw,
label='ROC curve of class {0} (area = {1:0.2f})'''.format(i, roc_auc[i]))
plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Some extension of Receiver operating characteristic to multi-class')
plt.legend(loc="lower right")
plt.show()
.
三、距離
.
1、海明距離
from sklearn.metrics import hamming_loss
y_pred = [1, 2, 3, 4]
y_true = [2, 2, 3, 4]
hamming_loss(y_true, y_pred)
0.25
.
2、Jaccard距離
import numpy as np
from sklearn.metrics import jaccard_similarity_score
y_pred = [0, 2, 1, 3,4]
y_true = [0, 1, 2, 3,4]
jaccard_similarity_score(y_true, y_pred)
0.5
jaccard_similarity_score(y_true, y_pred, normalize=False)
2
.
四、迴歸
1、 可釋方差值(Explained variance score)
from sklearn.metrics import explained_variance_score
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
explained_variance_score(y_true, y_pred)
.
2、 平均絕對誤差(Mean absolute error)
from sklearn.metrics import mean_absolute_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_absolute_error(y_true, y_pred)
.
3、 均方誤差(Mean squared error)
from sklearn.metrics import mean_squared_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_squared_error(y_true, y_pred)
.
from sklearn.metrics import median_absolute_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
median_absolute_error(y_true, y_pred)
.
5、 R方值,確定係數
from sklearn.metrics import r2_score
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
r2_score(y_true, y_pred)
.