Multiclass and multioutput overview of sklearn

阿新 • • 發佈：2020-12-30

Multiclass and multioutput algorithms

https://scikit-learn.org/stable/modules/multiclass.html#

sklearn 支援如下典型型別學習

multiclass -- 多類別

mulitlabel -- 多標籤

multioutput -- 多輸出

引入元模型 meta-estimators, 來實現這些型別的學習，將複雜的學習任務拆分為一些簡單任務的集合，對於每個簡單任務應用具體模型。

元模型的屬於是基模型 (base estimator), 就是具體的學習模型。

如下圖，提供了 multiclass 和 multioutput 子庫的結構，以及其包括的元模型。

其中多標籤 multilabel 僅僅是 multioutput的一種具體情況。

注：下文中提到的 multiclass-multiouput

而 multiclass-multiouput 又是 multilabel的泛化情況，即更加一般的情況。

因為 multilabel僅僅是標註的作用，其值是binary, 即 true 和 false

multiclass-output，其值是 multiclass，任意數量的分類。

This section of the user guide covers functionality related to multi-learning problems, including

multiclass, multilabel, and multioutput classification and regression.

The modules in this section implement meta-estimators, which require a base estimator to be provided in their constructor. Meta-estimators extend the functionality of the base estimator to support multi-learning problems, which is accomplished by transforming the multi-learning problem into a set of simpler problems, then fitting one estimator per problem.

This section covers two modules: sklearn.multiclass and sklearn.multioutput. The chart below demonstrates the problem types that each module is responsible for, and the corresponding meta-estimators that each module provides.

模型分類特徵表：

前面三個都是面向離散目標（分類），最後一個面向連續型/數值型目標（迴歸）

後三個目標數目是2個及其以上，屬於multioutput型別。

The table below provides a quick reference on the differences between problem types. More detailed explanations can be found in subsequent sections of this guide.

Number of targets

Target cardinality

Valid type_of_target

Multiclass classification

1

>2

‘multiclass’

Multilabel classification

>1

2 (0 or 1)

‘multilabel-indicator’

Multiclass-multioutput classification

>1

>2

‘multiclass-multioutput’

Multioutput regression

>1

Continuous

‘continuous-multioutput’

	Number of targets	Target cardinality	Valid `type_of_target`
Multiclass classification	1	>2	‘multiclass’
Multilabel classification	>1	2 (0 or 1)	‘multilabel-indicator’
Multiclass-multioutput classification	>1	>2	‘multiclass-multioutput’
Multioutput regression	>1	Continuous	‘continuous-multioutput’

具體模型（base estimators）

Below is a summary of scikit-learn estimators that have multi-learning support built-in, grouped by strategy. You don’t need the meta-estimators provided by this section if you’re using one of these estimators. However, meta-estimators can provide additional strategies beyond what is built-in:

Inherently multiclass:

naive_bayes.BernoulliNB

tree.DecisionTreeClassifier

tree.ExtraTreeClassifier

ensemble.ExtraTreesClassifier

naive_bayes.GaussianNB

neighbors.KNeighborsClassifier

semi_supervised.LabelPropagation

semi_supervised.LabelSpreading

discriminant_analysis.LinearDiscriminantAnalysis

svm.LinearSVC (setting multi_class=”crammer_singer”)

linear_model.LogisticRegression (setting multi_class=”multinomial”)

linear_model.LogisticRegressionCV (setting multi_class=”multinomial”)

neural_network.MLPClassifier

neighbors.NearestCentroid

discriminant_analysis.QuadraticDiscriminantAnalysis

neighbors.RadiusNeighborsClassifier

ensemble.RandomForestClassifier

linear_model.RidgeClassifier

linear_model.RidgeClassifierCV

Multiclass as One-Vs-One:

svm.NuSVC

svm.SVC.

gaussian_process.GaussianProcessClassifier (setting multi_class = “one_vs_one”)

Multiclass as One-Vs-The-Rest:

ensemble.GradientBoostingClassifier

gaussian_process.GaussianProcessClassifier (setting multi_class = “one_vs_rest”)

svm.LinearSVC (setting multi_class=”ovr”)

linear_model.LogisticRegression (setting multi_class=”ovr”)

linear_model.LogisticRegressionCV (setting multi_class=”ovr”)

linear_model.SGDClassifier

linear_model.Perceptron

linear_model.PassiveAggressiveClassifier

Support multilabel:

tree.DecisionTreeClassifier

tree.ExtraTreeClassifier

ensemble.ExtraTreesClassifier

neighbors.KNeighborsClassifier

neural_network.MLPClassifier

neighbors.RadiusNeighborsClassifier

ensemble.RandomForestClassifier

linear_model.RidgeClassifierCV

Support multiclass-multioutput:

tree.DecisionTreeClassifier

tree.ExtraTreeClassifier

ensemble.ExtraTreesClassifier

neighbors.KNeighborsClassifier

neighbors.RadiusNeighborsClassifier

ensemble.RandomForestClassifier

Multiclass-multioutput classification

https://scikit-learn.org/stable/modules/multiclass.html#multiclass-multioutput-classification

首先是多輸出模型，

其次每個輸出目標，都是多類的。

Multiclass-multioutput classification (also known as multitask classification) is a classification task which labels each sample with a set of non-binary properties. Both the number of properties and the number of classes per property is greater than 2. A single estimator thus handles several joint classification tasks. This is both a generalization of the multilabel classification task, which only considers binary attributes, as well as a generalization of the multiclass classification task, where only one property is considered.

For example, classification of the properties “type of fruit” and “colour” for a set of images of fruit. The property “type of fruit” has the possible classes: “apple”, “pear” and “orange”. The property “colour” has the possible classes: “green”, “red”, “yellow” and “orange”. Each sample is an image of a fruit, a label is output for both properties and each label is one of the possible classes of the corresponding property.

Note that all classifiers handling multiclass-multioutput (also known as multitask classification) tasks, support the multilabel classification task as a special case. Multitask classification is similar to the multioutput classification task with different model formulations. For more information, see the relevant estimator documentation.

目標的值，不僅僅是true和false

1.12.3.1. Target format

A valid representation of multioutput y is a dense matrix of shape (n_samples, n_classes) of class labels. A column wise concatenation of 1d multiclass variables. An example of y for 3 samples:
>>>
>>> y = np.array([['apple', 'green'], ['orange', 'orange'], ['pear', 'green']])
>>> print(y)
[['apple' 'green']
 ['orange' 'orange']
 ['pear' 'green']]

Multioutput regression

https://scikit-learn.org/stable/modules/multiclass.html#multioutput-regression

多輸出迴歸

首先是屬於multiouput型別，

其次是迴歸，限定的每個目標都是數值型。

例如預測某個地方的風向的風速，都是數值，但是是兩個目標。

Multioutput regression predicts multiple numerical properties for each sample. Each property is a numerical variable and the number of properties to be predicted for each sample is greater than or equal to 2. Some estimators that support multioutput regression are faster than just running n_output estimators.

For example, prediction of both wind speed and wind direction, in degrees, using data obtained at a certain location. Each sample would be data obtained at one location and both wind speed and direction would be output for each sample.

Target format

A valid representation of multioutput y is a dense matrix of shape (n_samples, n_classes) of floats. A column wise concatenation of continuous variables. An example of y for 3 samples:
>>>
>>> y = np.array([[31.4, 94], [40.5, 109], [25.0, 30]])
>>> print(y)
[[ 31.4  94. ]
 [ 40.5 109. ]
 [ 25.   30. ]]

MultiOutputRegressor

同 MultiOutputClassifier，將目標看成是獨立的，不相關的。

對於每個目標，都訓練單獨的模型。

Multioutput regression support can be added to any regressor with MultiOutputRegressor. This strategy consists of fitting one regressor per target. Since each target is represented by exactly one regressor it is possible to gain knowledge about the target by inspecting its corresponding regressor. As MultiOutputRegressor fits one regressor per target it can not take advantage of correlations between targets.

>>> from sklearn.datasets import make_regression
>>> from sklearn.multioutput import MultiOutputRegressor
>>> from sklearn.ensemble import GradientBoostingRegressor
>>> X, y = make_regression(n_samples=10, n_targets=3, random_state=1)
>>> MultiOutputRegressor(GradientBoostingRegressor(random_state=0)).fit(X, y).predict(X)
array([[-154.75474165, -147.03498585,  -50.03812219],
       [   7.12165031,    5.12914884,  -81.46081961],
       [-187.8948621 , -100.44373091,   13.88978285],
       [-141.62745778,   95.02891072, -191.48204257],
       [  97.03260883,  165.34867495,  139.52003279],
       [ 123.92529176,   21.25719016,   -7.84253   ],
       [-122.25193977,  -85.16443186, -107.12274212],
       [ -30.170388  ,  -94.80956739,   12.16979946],
       [ 140.72667194,  176.50941682,  -17.50447799],
       [ 149.37967282,  -81.15699552,   -5.72850319]])

RegressorChain

同 ClassifierChain，將目標定義為相關的，採用鏈式反饋，將模型連結起來。

Regressor chains (see RegressorChain) is analogous to ClassifierChain as a way of combining a number of regressions into a single multi-target model that is capable of exploiting correlations among targets.

Multiclass and multioutput overview of sklearn

Multiclass and multioutput algorithms https://scikit-learn.org/stable/modules/multiclass.html# sklearn 支援如下典型型別學習

rac-status.sh : an overview of your RAC / GI 11g,12c, 18c and 19c resources in a glimpse

#!/bin/bash # Fred Denis -- Jan 2016 -- http://unknowndba.blogspot.com -- [email protected] # # Quickly shows a status of all running instances accross a 11g, 12c, 18c+ cluster

Sample pipeline for text feature extraction and evaluation of sklearn

Sample pipeline for text feature extraction and evaluation https://scikit-learn.org/stable/auto_examples/model_selection/grid_search_text_feature_extraction.html#sphx-glr-auto-examples-model-selection

B. Omkar and Last Class of Math 思維lcm

題意　　給你一個n，要求給出兩個整數a和b，使得a+b=n且lcm(a,b)最小。思路　　結論：答案是k和n-k，k為n的最大真因子。

Codeforces Round #655 (Div. 2) B. Omkar and Last Class of Math (數學)

題意:給你一個正整數\\(n\\),求兩個正整數\\(a\\)和\\(b\\),使得\\(a+b=n\\),並且\\(LCM(a,b)\\)要儘可能的小.

Omkar and Last Class of Math

In Omkar\'s last class of math, he learned about the least common multiple, orLCMLCM.LCM(a,b)LCM(a,b)is the smallest positive integerxxwhich is divisible by bothaaandbb.

CppCon筆記--Back to Basics: RAII and the Rule of Zero

1.RAII 和 rule of three C++程式設計很多時候需要手動管理資源，其中包括資源的獲取，使用和釋放，而手動對資源釋放是很容易出錯的一個環節。

Leetcode 34 Find First and Last Position of Element in Sorted Array

題目描述給定已經排序好的序列nums，當給定target時，要求在時間複雜度為O(logn)內，查詢target在nums中的出現範圍，如果不存在，則返回[-1,-1]。

【leetcode-Python】-二分搜尋-34 Find First and Last Position of Element in Sorted Array

技術標籤：leetcode 目錄題目連結題目描述示例解決思路一解決思路一Python實現

datasets of sklearn

datasets sklearn提供了一些內建的小的玩具資料。也可以載入外部的一些資料。

Confusion Matrix of sklearn

Confusion Matrix https://machinelearningmastery.com/confusion-matrix-machine-learning/ 混淆矩陣是一種總結分類演算法效能的技術。

Classification report of sklearn

Classification report The classification_report function builds a text report showing the main classification metrics. Here is a small example with custom target_names and inferred labels:

Transforming the prediction target of sklearn

concept https://scikit-learn.org/stable/modules/preprocessing_targets.html#preprocessing-targets 對於監督性學習，其目標值需要進行轉化，才能作為模型的目標，或者更加有效地適應模型。

multilabel of sklearn

multilabel https://scikit-learn.org/stable/modules/multiclass.html#multilabel-classification 多標記，對於一個樣本資料，多個可能的標籤。

statistical learning - supervised_learning of sklearn

統計學習 https://scikit-learn.org/stable/tutorial/statistical_inference/index.html 資料量不停增加，增加了機器學習的重要性。

statistical learning -- Unsupervised learning of sklearn

Unsupervised learning https://scikit-learn.org/stable/tutorial/statistical_inference/unsupervised_learning.html

Visualizing the stock market structure of sklearn

Visualizing the stock market structure https://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html#stock-market

Manifold learning of sklearn

Manifold learning https://scikit-learn.org/stable/modules/manifold.html#locally-linear-embedding 流形學習是一種非線性降維方法，演算法是基於一種想法，很多資料集的高緯度是人為製造的高，並不是真的高。

Column Transformer with Mixed Types -- of sklearn

Column Transformer with Mixed Types https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py

Column Transformer with Heterogeneous Data Sources -- of sklearn

Column Transformer with Heterogeneous Data Sources https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer.html#sphx-glr-auto-examples-compose-plot-column-transformer-py

Multiclass and multioutput overview of sklearn

Multiclass and multioutput algorithms

具體模型（base estimators）

Multiclass-multioutput classification

1.12.3.1. Target format

Multioutput regression

Target format

MultiOutputRegressor

RegressorChain

相關推薦