sklearn：auc、roc_curve、roc_auc_score

阿新 • • 發佈：2018-12-08

sklearn.metrics.auc

作用：計算AUC(Area Under the Curve)

metrics.roc_curve

作用：計算 ROC(Receiver operating characteristic)
注意: this implementation is restricted to the binary classification task

sklearn.metrics.roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)

parameter :

y_true : array, shape = [n_samples] True binary labels. If labels are not either {-1, 1} or {0, 1}, then pos_label should be explicitly given
y_score : array, shape = [n_samples]
pos_label : int or str, default=None , Label considered as positive and others are considered negative.

Returns

fpr : false positive rates
tpr : true positive rates
thresholds : array, shape = [n_thresholds]

例子：

pos_label = 1即表示標籤為1的是正樣本，其餘的都是負樣本，因為這個只能做二分類。

import numpy as np
from sklearn import metrics
y = np.array([1, 1, 2, 2,3,3])
pred = np.array([0.1, 0.4, 0.35, 0.8,0.1,0.8])
fpr, tpr, thresholds = metrics.roc_curve(y, pred, pos_label = 1)
metrics.auc(fpr, tpr)
0.3125

sklearn.metrics.roc_auc_score

作用：Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores
注意：this implementation is restricted to the binary classification task or multilabel classification task inlabel indicator format.

sklearn.metrics.roc_auc_score(y_true, y_score, average=’macro’, sample_weight=None, max_fpr=None)

Parameters:

y_true : array, shape = [n_samples] or [n_samples, n_classes]

y_score : array, shape = [n_samples] or [n_samples, n_classes]
average : string, [None, ‘micro’, ‘macro’ (default), ‘samples’, ‘weighted’]，If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data。

Returns:

auc : float

### roc_auc_score
import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
roc_auc_score(y_true, y_scores)



0.75

roc_auc_score 是預測得分曲線下的 auc，在計算的時候呼叫了 auc；

def _binary_roc_auc_score(y_true, y_score, sample_weight=None):
if len(np.unique(y_true)) != 2:
raise ValueError("Only one class present in y_true. ROC AUC score "
"is not defined in that case.")

fpr, tpr, tresholds = roc_curve(y_true, y_score,
sample_weight=sample_weight)
return auc(fpr, tpr, reorder=True)

所以不能用在多分類問題上。

多分類問題的auc計算例子：

import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle
from sklearn import svm,datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from scipy import interp

匯入資料：

iris = datasets.load_iris()
X = iris.data
y = iris.target

對訓練標籤做標籤二值化運算（one-hot編碼）：

# Binarize the output
y = label_binarize(y,classes=[0,1,2])
n_classes = y.shape[1]
n_classes




3

對每個資料在尾部加入噪音：

# Add noisy features to make the problem harder
random_state = np.random.RandomState(0)
n_samples, n_features = X.shape
X = np.c_[X,random_state.randn(n_samples,200 * n_features)]

注：np.c_

np.c_[random_state.randn(2,2),[[0,0],[1,1]]]



array([[ 0.73381936,  0.26909417,  0.        ,  0.        ],
       [ 1.07274021, -0.9826661 ,  1.        ,  1.        ]])

劃分資料集：

# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.5,random_state=0)

fit一個分類器：

# Learn to predict each class against the other
classifier = OneVsRestClassifier(svm.SVC(kernel='linear',probability=True,
                                        random_state=random_state))
y_score = classifier.fit(X_train,y_train).decision_function(X_test)

注：decision_function(X)：
Returns the distance of each sample from the decision boundary for each class.

計算每一個類別的ROC與AUC：

# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    # 取出來的是各個類的測試值和預測值
    fpr[i], tpr[i],_ = roc_curve(y_test[:, i],y_score[:,i])
    roc_auc[i] = auc(fpr[i], tpr[i])
#Compute micro-average ROC curve and ROC area
#類總和的基礎上平均的ROC 和 AUC
fpr["micro"],tpr["micro"],_ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"],tpr["micro"])

繪圖：

plt.rcParams['savefig.dpi'] = 300 #圖片畫素
plt.rcParams['figure.dpi'] = 300 #解析度
plt.figure()
# linewidth
lw = 2
plt.plot(fpr[2], tpr[2], color='darkorange',
         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.show()

在這裡插入圖片描述

# Compute macro-average ROC curve and ROC area

# First aggregate all false positive rates
all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))

# Then interpolate all ROC curves at this points
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
    mean_tpr += interp(all_fpr, fpr[i], tpr[i])

# Finally average it and compute AUC
mean_tpr /= n_classes

fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])

# Plot all ROC curves
plt.figure()
plt.plot(fpr["micro"], tpr["micro"],
         label='micro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["micro"]),
         color='deeppink', linestyle=':', linewidth=4)

plt.plot(fpr["macro"], tpr["macro"],
         label='macro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["macro"]),
         color='navy', linestyle=':', linewidth=4)

colors = cycle(['aqua', 'darkorange', 'cornflowerblue'])
for i, color in zip(range(n_classes), colors):
    plt.plot(fpr[i], tpr[i], color=color, lw=lw,
             label='ROC curve of class {0} (area = {1:0.2f})'
             ''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Some extension of Receiver operating characteristic to multi-class')
plt.legend(loc="lower right")
plt.show()

在這裡插入圖片描述

sklearn：auc、roc_curve、roc_auc_score

sklearn.metrics.auc

metrics.roc_curve

parameter :

Returns

sklearn.metrics.roc_auc_score

Parameters:

Returns:

多分類問題的auc計算例子：

sklearn：auc、roc_curve、roc_auc_score

評估指標：ROC，AUC，Precision、Recall、F1-score

機器學習實踐（十五）—sklearn之分類演算法-邏輯迴歸、精確率、召回率、ROC、AUC

機器學習評估指標：Precision、recall、F-measure、Accuracy、ROC曲線和AUC

sklearn庫：分類、迴歸、聚類、降維、模型優化、文字預處理實現用例（趕緊收藏）

機器學習常見評價指標：AUC、Precision、Recall、F-measure、Accuracy

深度學習：評價指標：準確率、精確率、回召率、 F-1 Score、ROC、AUC

機器學習（一）：用sklearn進行資料預處理：缺失值處理、資料標準化、歸一化

NLP：sklearn中fit、fit_transform、transform的區別

Qt入門之基礎篇 ( 二 ) ：Qt項目建立、編譯、運行和發布過程解析

python基礎：元組、字典、深淺拷貝與函數

Rancher 1.6發布：EBS支持、密文管理和CLI增強

Mina、Netty、Twisted一起學（五）：整合protobuf

GO_05_2：Golang 中 panic、recover、defer 的用法

【iOS開發-51】案例學習：動畫新寫法、刪除子視圖、視圖順序、延遲方法、button多功能使用方法及icon圖標和啟動頁設置

九度OJ 題目1204：農夫、羊、菜和狼的故事

進擊的Python【第九章】：paramiko模塊、線程與進程、各種線程鎖、queue隊列、生產者消費者模型

【只怕沒有幾個人能說清楚】系列之四：碰撞信息、觸發信息的檢測

【2017-05-22】WebForm內置對象：Application和ViewState、Repeater的Command用法

2.7.1 元素定位：selenium消息框處理（alert、confirm、prompt）

sklearn：auc、roc_curve、roc_auc_score

sklearn.metrics.auc

metrics.roc_curve

parameter :

Returns

sklearn.metrics.roc_auc_score

Parameters:

Returns:

多分類問題的auc計算例子：

相關推薦