sklearn學習筆記三整合方法AdaBoost

阿新 • • 發佈：2019-01-21

sklearn.ensemble模組提供了很多整合方法，AdaBoost、Bagging、隨機森林等。本文使用的是AdaBoostClassifier。

AdaBoostClassifier這個函式，一共有5個引數：

引數說明如下：

base_estimator：可選引數，預設為DecisionTreeClassifier。理論上可以選擇任何一個分類或者回歸學習器，不過需要支援樣本權重。我們常用的一般是CART決策樹或者神經網路MLP。預設是決策樹，即AdaBoostClassifier預設使用CART分類樹DecisionTreeClassifier，而AdaBoostRegressor預設使用CART迴歸樹DecisionTreeRegressor。另外有一個要注意的點是，如果我們選擇的AdaBoostClassifier演算法是SAMME.R，則我們的弱分類學習器還需要支援概率預測，也就是在scikit-learn中弱分類學習器對應的預測方法除了predict還需要有predict_proba。

algorithm：可選引數，預設為SAMME.R。scikit-learn實現了兩種Adaboost分類演算法，SAMME和SAMME.R。兩者的主要區別是弱學習器權重的度量，SAMME使用對樣本集分類效果作為弱學習器權重，而SAMME.R使用了對樣本集分類的預測概率大小來作為弱學習器權重。由於SAMME.R使用了概率度量的連續值，迭代一般比SAMME快，因此AdaBoostClassifier的預設演算法algorithm的值也是SAMME.R。我們一般使用預設的SAMME.R就夠了，但是要注意的是使用了SAMME.R，則弱分類學習器引數base_estimator必須限制使用支援概率預測的分類器。SAMME演算法則沒有這個限制。

n_estimators：整數型，可選引數，預設為50。弱學習器的最大迭代次數，或者說最大的弱學習器的個數。一般來說n_estimators太小，容易欠擬合，n_estimators太大，又容易過擬合，一般選擇一個適中的數值。預設是50。在實際調參的過程中，我們常常將n_estimators和下面介紹的引數learning_rate一起考慮。
learning_rate：浮點型，可選引數，預設為1.0。每個弱學習器的權重縮減係數，取值範圍為0到1，對於同樣的訓練集擬合效果，較小的v意味著我們需要更多的弱學習器的迭代次數。通常我們用步長和迭代最大次數一起來決定演算法的擬合效果。所以這兩個引數n_estimators和learning_rate要一起調參。一般來說，可以從一個小一點的v開始調參，預設是1。

random_state：整數型，可選引數，預設為None。如果RandomState的例項，random_state是隨機數生成器; 如果None，則隨機數生成器是由np.random使用的RandomState例項

Methods

Compute the decision function of `X`.
`fit`(X, y[, sample_weight])	Build a boosted classifier from the training set (X, y).
Get parameters for this estimator.
Predict classes for X.
Predict class log-probabilities for X.
Predict class probabilities for X.
`score`(X, y[, sample_weight])	Returns the mean accuracy on the given test data and labels.
Set the parameters of this estimator.
Compute decision function of `X` for each boosting iteration.
Return staged predictions for X.
Predict class probabilities for X.
`staged_score`(X, y[, sample_weight])	Return staged scores for X, y.

（1）decision_function(X)

Compute the decision function of X.

Parameters:

Parameters:	X : {array-like, sparse matrix} of shape = [n_samples, n_features] The training input samples. Sparse matrix can be CSC, CSR, COO,DOK, or LIL. DOK and LIL are converted to CSR.
Returns:	score : array, shape = [n_samples, k] The decision function of the input samples. The order ofoutputs is the same of that of the classes_ attribute.Binary classification is a special cases with `k == 1`,otherwise `k==n_classes`. For binary classification,values closer to -1 or 1 mean more like the first or secondclass in `classes_`, respectively.

X : {array-like, sparse matrix} of shape = [n_samples, n_features]

The training input samples. Sparse matrix can be CSC, CSR, COO,DOK, or LIL. DOK and LIL are converted to CSR.

Returns:

score : array, shape = [n_samples, k]

The decision function of the input samples. The order ofoutputs is the same of that of the classes_ attribute.Binary classification is a special cases with k == 1,otherwise k==n_classes. For binary classification,values closer to -1 or 1 mean more like the first or secondclass in classes_, respectively.

decision_function（X）

    計算X的決定函式。
    引數：

    X：shape = [n_samples，n_features]的{陣列狀，稀疏矩陣}

        訓練輸入樣本。稀疏矩陣可以是CSC，CSR，COO，DOK或LIL。 DOK和LIL被轉換為CSR。

    返回：

    score：array，shape = [n_samples，k]

輸入樣本的決策功能。輸出的順序與classes_屬性的順序相同。二元分類是k == 1的特殊情況，否則是k == n_classes。對於二元分類，接近-1或1的值分別意味著更類似於classes_中的第一個或第二個類。

（2）fit(X, y, sample_weight=None)

Build a boosted classifier from the training set (X, y).

Parameters:

Parameters:	X : {array-like, sparse matrix} of shape = [n_samples, n_features] The training input samples. Sparse matrix can be CSC, CSR, COO,DOK, or LIL. DOK and LIL are converted to CSR. y : array-like of shape = [n_samples] The target values (class labels). sample_weight : array-like of shape = [n_samples], optional Sample weights. If None, the sample weights are initialized to`1 / n_samples`.
Returns:	self : object Returns self.

X : {array-like, sparse matrix} of shape = [n_samples, n_features]

The training input samples. Sparse matrix can be CSC, CSR, COO,DOK, or LIL. DOK and LIL are converted to CSR.

y : array-like of shape = [n_samples]

The target values (class labels).

sample_weight : array-like of shape = [n_samples], optional

Sample weights. If None, the sample weights are initialized to1 / n_samples.

Returns:

self : object

Returns self.

fit（X，y，sample_weight =無）

    從訓練集（X，y）中構建一個助推分類器。
    引數：

    X：shape = [n_samples，n_features]的{陣列狀，稀疏矩陣}

        訓練輸入樣本。稀疏矩陣可以是CSC，CSR，COO，DOK或LIL。 DOK和LIL被轉換為CSR。

    y：陣列 - 像shape = [n_samples]

        目標值（類標籤）。

    sample_weight：類似於shape = [n_samples]的陣列，可選

        樣品重量。如果無，則樣本權重將初始化為1 / n_samples。

    返回：

    自我：物件

返回自我。

(3) predict(X)

Predict classes for X.

The predicted class of an input sample is computed as the weighted meanprediction of the classifiers in the ensemble.

Parameters:

Parameters:	X : {array-like, sparse matrix} of shape = [n_samples, n_features] The training input samples. Sparse matrix can be CSC, CSR, COO,DOK, or LIL. DOK and LIL are converted to CSR.
Returns:	y : array of shape = [n_samples] The predicted classes.

X : {array-like, sparse matrix} of shape = [n_samples, n_features]

The training input samples. Sparse matrix can be CSC, CSR, COO,DOK, or LIL. DOK and LIL are converted to CSR.

Returns:

y : array of shape = [n_samples]

The predicted classes.

預測（X）

    預測X的類。

    輸入樣本的預測類別被計算為集合中分類器的加權平均預測。
    引數：

    X：shape = [n_samples，n_features]的{陣列狀，稀疏矩陣}

        訓練輸入樣本。稀疏矩陣可以是CSC，CSR，COO，DOK或LIL。 DOK和LIL被轉換為CSR。

    返回：

    y：形狀陣列= [n_samples]

        預測的類。

sklearn學習筆記三整合方法AdaBoost

sklearn學習筆記三整合方法AdaBoost

Java設計模式學習筆記(三) 工廠方法模式

csdn學習筆記三：meta元表、元方法 index, newindex、rawset、rawget

《C語言程式設計：現代方法（第2版）（K.N.King 著）》學習筆記三：C語言基本概念（2）

csdn學習筆記三：meta元表、元方法 index, newindex、rawset、rawget

OpenGL學習筆記(三)---FreeImage顏色顯示錯亂的解決方法

jvm學習筆記(三)類檔案結構、java方法數65535上限的原因

kafka學習筆記(三)spring boot整合kafka0.9.0.1（使用配置類）

2016-8-5 Unity學習筆記三（常用方法）

Linux學習筆記(三)：系統執行級與執行級的切換

【Unity 3D】學習筆記三十：遊戲元素——遊戲地形

Sklearn學習筆記

MYSQL學習筆記三：日期和時間函數

Hadoop權威指南學習筆記三

NLTK學習筆記(三):NLTK的一些工具

Tomcat學習筆記(三)

mybatis學習筆記(三）-- 優化數據庫連接配置

Odoo10學習筆記三：模型（結構化的應用數據）、視圖（用戶界面設計）

tensorflow學習筆記(三)：實現自編碼器

CSS學習筆記三：自定義單選框，復選框，開關

sklearn學習筆記 三 整合方法AdaBoost

相關推薦

sklearn學習筆記三整合方法AdaBoost