1. 程式人生 > 實用技巧 >Multiclass and multioutput overview of sklearn

Multiclass and multioutput overview of sklearn

Multiclass and multioutput algorithms

https://scikit-learn.org/stable/modules/multiclass.html#

sklearn 支援如下典型型別學習

multiclass -- 多類別

mulitlabel -- 多標籤

multioutput -- 多輸出

引入元模型 meta-estimators, 來實現這些型別的學習,將複雜的學習任務拆分為一些簡單任務的集合, 對於每個簡單任務應用具體模型。

元模型的屬於是 基模型 (base estimator), 就是具體的學習模型。

如下圖,提供了 multiclass 和 multioutput 子庫的結構,以及其包括的元模型。

其中多標籤 multilabel 僅僅是 multioutput的一種具體情況。

注: 下文中 提到的 multiclass-multiouput

而 multiclass-multiouput 又是 multilabel的泛化情況, 即更加一般的情況。

因為 multilabel僅僅是標註的作用, 其值是binary, 即 true 和 false

multiclass-output, 其值是 multiclass, 任意數量的分類。

This section of the user guide covers functionality related to multi-learning problems, including

multiclass, multilabel, and multioutput classification and regression.

The modules in this section implement meta-estimators, which require a base estimator to be provided in their constructor. Meta-estimators extend the functionality of the base estimator to support multi-learning problems, which is accomplished by transforming the multi-learning problem into a set of simpler problems, then fitting one estimator per problem.

This section covers two modules: sklearn.multiclass and sklearn.multioutput. The chart below demonstrates the problem types that each module is responsible for, and the corresponding meta-estimators that each module provides.

模型分類特徵表:

前面三個都是面向離散目標(分類), 最後一個面向連續型/數值型目標(迴歸)

後三個目標數目是2個及其以上, 屬於multioutput型別。

The table below provides a quick reference on the differences between problem types. More detailed explanations can be found in subsequent sections of this guide.

Number of targets

Target cardinality

Valid type_of_target

Multiclass classification

1

>2

‘multiclass’

Multilabel classification

>1

2 (0 or 1)

‘multilabel-indicator’

Multiclass-multioutput classification

>1

>2

‘multiclass-multioutput’

Multioutput regression

>1

Continuous

‘continuous-multioutput’

具體模型(base estimators)

Below is a summary of scikit-learn estimators that have multi-learning support built-in, grouped by strategy. You don’t need the meta-estimators provided by this section if you’re using one of these estimators. However, meta-estimators can provide additional strategies beyond what is built-in:

Multiclass-multioutput classification

https://scikit-learn.org/stable/modules/multiclass.html#multiclass-multioutput-classification

首先是多輸出模型,

其次每個輸出目標, 都是多類的。

Multiclass-multioutput classification (also known as multitask classification) is a classification task which labels each sample with a set of non-binary properties. Both the number of properties and the number of classes per property is greater than 2. A single estimator thus handles several joint classification tasks. This is both a generalization of the multilabel classification task, which only considers binary attributes, as well as a generalization of the multiclass classification task, where only one property is considered.

For example, classification of the properties “type of fruit” and “colour” for a set of images of fruit. The property “type of fruit” has the possible classes: “apple”, “pear” and “orange”. The property “colour” has the possible classes: “green”, “red”, “yellow” and “orange”. Each sample is an image of a fruit, a label is output for both properties and each label is one of the possible classes of the corresponding property.

Note that all classifiers handling multiclass-multioutput (also known as multitask classification) tasks, support the multilabel classification task as a special case. Multitask classification is similar to the multioutput classification task with different model formulations. For more information, see the relevant estimator documentation.

目標的值, 不僅僅是true和false

1.12.3.1. Target format

A valid representation of multioutput y is a dense matrix of shape (n_samples, n_classes) of class labels. A column wise concatenation of 1d multiclass variables. An example of y for 3 samples:

>>>
>>> y = np.array([['apple', 'green'], ['orange', 'orange'], ['pear', 'green']])
>>> print(y)
[['apple' 'green']
 ['orange' 'orange']
 ['pear' 'green']]

Multioutput regression

https://scikit-learn.org/stable/modules/multiclass.html#multioutput-regression

多輸出迴歸

首先是屬於multiouput型別,

其次是迴歸,限定的每個目標都是數值型。

例如 預測某個地方的 風向的風速, 都是數值, 但是是兩個目標。

Multioutput regression predicts multiple numerical properties for each sample. Each property is a numerical variable and the number of properties to be predicted for each sample is greater than or equal to 2. Some estimators that support multioutput regression are faster than just running n_output estimators.

For example, prediction of both wind speed and wind direction, in degrees, using data obtained at a certain location. Each sample would be data obtained at one location and both wind speed and direction would be output for each sample.

Target format

A valid representation of multioutput y is a dense matrix of shape (n_samples, n_classes) of floats. A column wise concatenation of continuous variables. An example of y for 3 samples:

>>>
>>> y = np.array([[31.4, 94], [40.5, 109], [25.0, 30]])
>>> print(y)
[[ 31.4  94. ]
 [ 40.5 109. ]
 [ 25.   30. ]]

MultiOutputRegressor

同 MultiOutputClassifier, 將目標看成是獨立的, 不相關的。

對於每個目標, 都訓練單獨的模型。

Multioutput regression support can be added to any regressor with MultiOutputRegressor. This strategy consists of fitting one regressor per target. Since each target is represented by exactly one regressor it is possible to gain knowledge about the target by inspecting its corresponding regressor. As MultiOutputRegressor fits one regressor per target it can not take advantage of correlations between targets.

>>> from sklearn.datasets import make_regression
>>> from sklearn.multioutput import MultiOutputRegressor
>>> from sklearn.ensemble import GradientBoostingRegressor
>>> X, y = make_regression(n_samples=10, n_targets=3, random_state=1)
>>> MultiOutputRegressor(GradientBoostingRegressor(random_state=0)).fit(X, y).predict(X)
array([[-154.75474165, -147.03498585,  -50.03812219],
       [   7.12165031,    5.12914884,  -81.46081961],
       [-187.8948621 , -100.44373091,   13.88978285],
       [-141.62745778,   95.02891072, -191.48204257],
       [  97.03260883,  165.34867495,  139.52003279],
       [ 123.92529176,   21.25719016,   -7.84253   ],
       [-122.25193977,  -85.16443186, -107.12274212],
       [ -30.170388  ,  -94.80956739,   12.16979946],
       [ 140.72667194,  176.50941682,  -17.50447799],
       [ 149.37967282,  -81.15699552,   -5.72850319]])

RegressorChain

同 ClassifierChain, 將目標定義為相關的, 採用鏈式反饋, 將模型連結起來。

Regressor chains (see RegressorChain) is analogous to ClassifierChain as a way of combining a number of regressions into a single multi-target model that is capable of exploiting correlations among targets.