Model selection and evaluation

阿新 • • 發佈：2018-12-09

查準率、查全率與F1

對於二分類問題，可將樣例根據其真實類別與學習器預測類別的組合劃分為

真正例(true positive) :真的正樣本,本來就是正樣本
假正例(false positive) :假的正樣本,其實是負樣本
真反倒(true negative) :真的負樣本,本來就是負樣本
假反例(false negative) 假的負樣本,其實是正樣本

顯然有TP+FP+TN+FN=樣例總數.上述4種情況可以理解為: 以預測結果的角度對每個樣本的判決產生看法

查準率/正確率P(確保預測為正的樣本中有更多樣本是正確的) : $P = \frac{TP}{TP+FP},$ 分類正確的正樣本/所有判斷為正的樣本總數
查全率/召回率R(確保所有正樣本中有更多樣本被正確分類

): $R=\frac{TP}{TP+FN}$ ,分類正確的正樣本/資料集中的正樣本

查準率和查全率是一對矛盾的度量.一般來說，查準率高時，查全率往往偏低;而查全率高時，查準率往往偏低.

例如，若希望將好瓜儘可能多地選出來，則可通過增加選瓜的數量來實現，如果將所有西瓜都選上，那麼所有的好瓜也必然都被選上了，但這樣查準率就會較低;若希望選出的瓜中好瓜比例儘可能高，則可只挑選最有把握的瓜，但這樣就難免會漏掉不少好瓜，使得查全率較低.通常只有在一些簡單任務中才可能使查全率和查準率都很高.

P-R 曲線

在很多情形下我們可根據學習器的預測結果對樣例進行排序，排在前面的是學習器認為"最可能"是正例的樣本,排在最後的則是學習器認為"最不可能"是正例的樣本.通過置信度就可以對所有樣本進行排序，再逐個樣本的選擇閾值，在該樣本之前的都屬於正例，該樣本之後的都屬於負例。每一個樣本作為劃分閾值時，都可以計算對應的P和R.

若一個學習器的P-R 曲線被另一個學習器的曲線完全"包住" ，則可斷言後者的效能優於前者，中學習器A 的效能優於學習器C;
兩個學習器的P-R 曲線發生了交叉,

"平衡點"一個度量，它是" 查準率=查全率"時的取值,越大效能越好,所以學習器A 優於B
F1度量: $F1 = \frac{2\times P \times R}{P+R}=\frac{2\times TP}{N +TP-TN}$ ,N為樣例總數,

在一些應用中，對查準率和查全率的重視程度有所不同.例如在商品推薦系統中，為了儘可能少打擾使用者，更希望推薦內容確是使用者感興趣的，此時查準率更重要;而在逃犯資訊檢索系統中，更希望儘可能少漏掉逃犯，此時查全率更重要. F1 度量的一般形式 $F_{\beta }$ 能讓我們表達出對查準率/查全率的不同偏好，

$F_{\beta } = \frac{(1+\beta^{2})\times P \times R}{(\beta^{2}\times P)+R}$

$\beta = 1$ 退化為標準的F1; $\beta > 1$ 時查全率有更大影響; $\beta < 1$

時查準率有更大影響.

ROC 與AUC

根據上一小節,在不同的應用任務中，我們可根據任務需求來採用不同的截斷點，例如若我們更重視"查準率"，則可選擇排序中靠前的位置進行截斷;若更重視"查全率"，則可選擇靠後的位置進行截斷.因此，排序本身的質量好壞，體現了綜合考慮學習器在不同任務下的"期望泛化效能"的好壞，或者說"一般情況下"泛化效能的好壞. ROC 曲線則是從這個角度出發來研究學習器泛化效能的有力工具.

縱軸是"真正例率" (True Positive Rate，TPR) $TPR = \frac{TP}{TP+FN}$ .所有正樣本中被正確分類的比例
橫軸是"假正例率" (False PositiveRate，FPR) $FPR = \frac{FP}{TN+FP}$ .所有負樣本中被錯誤分類的比例

若一個學習器的ROC曲線被另一個學習器的曲線完全"包住" ，則可斷言後者的效能優於前者.
兩個學習器的P-R 曲線發生了交叉, AUC度量, $AUC = \frac{1}{2}\sum_{i=1}^{m-1}(x_{i+1}-x_{i})\cdot (y_{i}+y_{i+1})$

import numpy as np
import matplotlib.pylab as plt
from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import AdaBoostClassifier

def plotROC(predStrengths, classLabels):
    """
    每遇到一個+1標籤，沿著y軸下降一個步長，降低真正例率；
    每遇到一個其他標籤，沿著x軸倒退一個步長，降低假正例率；
    :param predStrengths:
    :param classLabels:
    :return:
    """
    cursor = (1.0, 1.0)                                 # 遊標位置
    ySum = 0.0                                          # 計算AUC的變數
    numPositiveClass = sum(np.array(classLabels) == 1.0)
    yStep = 1 / float(numPositiveClass)                 # 確定了y軸步長
    xStep = 1 / float(len(classLabels) - numPositiveClass)# 確定了y軸步長
    #陣列值從小到大的索引值
    sortedIndicies = predStrengths.argsort()             #從小到大順序排列，從(1.0，1.0)開始畫一直到(0,0)
    fig = plt.figure()
    fig.clf()
    ax = plt.subplot(111)
    # loop through all the values, drawing a line segment at each point
    for index in sortedIndicies.tolist():
        if classLabels[index] == 1.0:
            delX = 0
            delY = yStep
        else:
            delX = xStep
            delY = 0
            ySum += cursor[1]
        # draw line from cursor to (cursor[0]-delX,cursor[1]-delY)
        ax.plot([cursor[0], cursor[0] - delX], [cursor[1], cursor[1] - delY], c='b')
        cursor = (cursor[0] - delX, cursor[1] - delY)
    ax.plot([0, 1], [0, 1], 'b--')
    plt.xlabel('False positive rate');
    plt.ylabel('True positive rate')
    plt.title('ROC cursorve for AdaBoost horse colic detection system')
    ax.axis([0, 1, 0, 1])
    plt.show()
    # 每個小矩形相加，矩形的寬度為xStep，因此對矩形的高度進行相加得到ySum
    print("the Area Under the cursorve is: ", ySum * xStep)

if __name__ == "__main__":
    X, y = make_hastie_10_2(n_samples=4000, random_state=1)
    X_test, y_test = X[2000:], y[2000:]
    X_train, y_train = X[:2000], y[:2000]
    clf = AdaBoostClassifier(n_estimators=100)
    clf.fit(X_train,y_train)
    preds = clf.predict_proba(X_test)
    plotROC(preds[:,1],y_test)

scikit-learn：3. Model selection and evaluation

ews util tree ask efficient square esc alter 1.10 參考：http://scikit-learn.org/stable/model_selection.html 有待翻譯，敬請期待： 3.1. Cross-val

Model selection and evaluation

查準率、查全率與F1 對於二分類問題，可將樣例根據其真實類別與學習器預測類別的組合劃分為真正例(true positive) :真的正樣本,本來就是正樣本假正例(false positive) :假的正樣本,其實是負樣本真反倒(true negative) :真

ISLR第六章Linear Model Selection and Regularization

another 訓練數據 16px style strong not 效率找到使用本章主要介紹幾種可替代普通最小二乘擬合的其他一些方法。 Why might we want to use another fitting procedure instead of le

規則化和模型選擇（Regularization and model selection）——機器學習：交叉驗證Cross validation

零問題提出在機器學習中的偏差與方差一文中提到了偏差與方差。那麼在多種預測模型，如線性迴歸(y=θTx)，多項式迴歸(y=θTx^(1~m))等，應使用那種模型才能達到偏差與方差的平衡最優？形式化定義：假設可選的模型集合是M={M1,M2,...,Md}，比如SVM，

Class Text Classification Model Comparison and Selection

Way better!df['post'].apply(lambda x: len(x.split(' '))).sum()3421180After text cleaning and removing stop words, we have only over 3 million words to work

Pytorch model saving and loading 模型保存和讀取

save pro pat args .py ams str comm pre It is really useful to save and reload the model and its parameters during or after training in de

DAVIS2016-A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation

專案網址：https://davischallenge.org/davis2016/code.html 稠密標註的視訊目標分割資料集可以用於半監督和無監督的方法檢驗可以用於目標分割問題，也可以用於運動檢測問題有訓練集，但測試集包含訓練集後續會更新如何使用該

[keras] model methods and properties

Model(Container) fit evaluate predict train on batch # fit generator calls this func test_on_batc

Automatic model selection: H2O AutoML

Automatic model selection: H2O AutoMLIn this post, we will use H2O AutoML for auto model selection and tuning. This is an easy way to get a good tuned mode

[PWA] Disable Text Selection and Touch Callouts in a PWA on iOS

Because an installed PWA is really just a web app running in a browser, there are some browser behaviors that we may not want in our app, if we're going fo

R programming for feature selection and regression

data introduction Select packages Split dataset feature selection tune parameters prediciton 1. data introduction 我的資料包含

cesium 1.52 demo _ Selection and Description.html

<!DOCTYPE html> <html lang="en"> <head>  <meta charset="utf-8"> <!-- T

深度學習模型壓縮與優化加速（Model Compression and Acceleration Overview）

1. 簡介深度學習（Deep Learning）因其計算複雜度或引數冗餘，在一些場景和裝置上限制了相應的模型部署，需要藉助模型壓縮、優化加速、異構計算等方法突破瓶頸。模型壓縮演算法能夠有效降低引數冗餘，從而減少儲存佔用、通訊頻寬和計算複雜度，有助於深度學習的應用部署，

【立體匹配和深度估計 1】《A taxonomy and evaluation of dense two-frame stereo correspondence algorithms》

《A taxonomy and evaluation of dense two-frame stereo correspondence algorithms》是 Scharstein、Szeliski 和 Zabih 三位作者合著的一篇關於立體匹配的綜述文章，這篇文章在立體匹配領域具

論文翻譯：Development and Evaluation of Emerging Design Patterns for Ubiquitous Computing

Development and Evaluation of Emerging Design Patterns for Ubiquitous Computing Eric S. Chung1, Jason I. Hong1, James Lin1, Madhu K. Pra

Regularization(規則化)和model selection以及Python實現

這次以非線性轉換(Nonlinear Transformation)為例，分別通過對多項式次數的選擇和regularization避免過擬合，還通過model selection來提高識別能力。（一）非線性模型原來的學習模型大多假設兩個類別是線性可分的，所以找到了一條

Amazon SageMaker Neural Topic Model now supports auxiliary vocabulary channel, new topic evaluation metrics, and training subsam

In this blog post, we introduce three new features of the Amazon SageMaker Neural Topic Model (NTM) that are designed to help improve user product

Model selection and evaluation

查準率、查全率與F1

P-R 曲線

ROC 與AUC

scikit-learn：3. Model selection and evaluation

Model selection and evaluation

ISLR第六章Linear Model Selection and Regularization

規則化和模型選擇（Regularization and model selection）——機器學習：交叉驗證Cross validation

Class Text Classification Model Comparison and Selection

Pytorch model saving and loading 模型保存和讀取

DAVIS2016-A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation

[keras] model methods and properties

Automatic model selection: H2O AutoML

[PWA] Disable Text Selection and Touch Callouts in a PWA on iOS

R programming for feature selection and regression

cesium 1.52 demo _ Selection and Description.html

深度學習模型壓縮與優化加速（Model Compression and Acceleration Overview）

【立體匹配和深度估計 1】《A taxonomy and evaluation of dense two-frame stereo correspondence algorithms》

論文翻譯：Development and Evaluation of Emerging Design Patterns for Ubiquitous Computing

Regularization(規則化)和model selection以及Python實現

Amazon SageMaker Neural Topic Model now supports auxiliary vocabulary channel, new topic evaluation metrics, and training subsam

Fisher Vector Encoding and Gaussian Mixture Model

Evaluation map and reflexive space

關於eclipse出現The selection cannot be launched,and there are no recent launches

Model selection and evaluation

查準率、查全率與F1

P-R 曲線

ROC 與AUC

相關推薦