Confusion Matrix of sklearn

阿新 • • 發佈：2020-12-21

Confusion Matrix

https://machinelearningmastery.com/confusion-matrix-machine-learning/

混淆矩陣是一種總結分類演算法效能的技術。

如果單獨看正確率，很有可能掩蓋了一些小資料量的類的情況。換句話說分類的資料是不均衡的，相對於每類，有的類資料多，有的類資料少。

同時通過這個工具，你還可以明顯地看出，那些分類結果是好的，那些錯誤容易犯。

A confusion matrix is a technique for summarizing the performance of a classification algorithm.

Classification accuracy alone can be misleading if you have an unequal number of observations in each class or if you have more than two classes in your dataset.

Calculating a confusion matrix can give you a better idea of what your classification model is getting right and what types of errors it is making.

confusion_matrix API

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

混淆矩陣每一行，都應一個類。

每一列對應預測的一個分類。

每一個元素，就是將一個類判斷為另外一個類的概率。

>>> from sklearn.metrics import confusion_matrix
>>> y_true = [2, 0, 2, 2, 0, 1]
 
>>> y_pred = [0, 0, 2, 2, 0, 2]
>>> confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
       [0, 0, 1],
       [1, 0, 2]])

`plot_confusion_matrix API`

https://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix

混淆矩陣圖，更加形象。

使用正規化的選項，可以讓各個類，在百分比上進行橫向比較。

計數模式，可以幫助我們看到，那些類是資料量小，哪些類資料量大。

plot_confusion_matrix can be used to visually represent a confusion matrix as shown in the Confusion matrix example, which creates the following figure:

https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html

Example of confusion matrix usage to evaluate the quality of the output of a classifier on the iris data set. The diagonal elements represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier. The higher the diagonal values of the confusion matrix the better, indicating many correct predictions.

The figures show the confusion matrix with and without normalization by class support size (number of elements in each class). This kind of normalization can be interesting in case of class imbalance to have a more visual interpretation of which class is being misclassified.

Here the results are not as good as they could be as our choice for the regularization parameter C was not the best. In real life applications this parameter is usually chosen using Tuning the hyper-parameters of an estimator.

Out:
Confusion matrix, without normalization
[[13  0  0]
 [ 0 10  6]
 [ 0  0  9]]
Normalized confusion matrix
[[1.   0.   0.  ]
 [0.   0.62 0.38]
 [0.   0.   1.  ]]

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt

from sklearn import svm, datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_confusion_matrix

# import some data to play with
iris = datasets.load_iris()
X = iris.data
y = iris.target
class_names = iris.target_names

# Split the data into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

# Run classifier, using a model that is too regularized (C too low) to see
# the impact on the results
classifier = svm.SVC(kernel='linear', C=0.01).fit(X_train, y_train)

np.set_printoptions(precision=2)

# Plot non-normalized confusion matrix
titles_options = [("Confusion matrix, without normalization", None),
                  ("Normalized confusion matrix", 'true')]
for title, normalize in titles_options:
    disp = plot_confusion_matrix(classifier, X_test, y_test,
                                 display_labels=class_names,
                                 cmap=plt.cm.Blues,
                                 normalize=normalize)
    disp.ax_.set_title(title)

    print(title)
    print(disp.confusion_matrix)

plt.show()

Confusion Matrix of sklearn

Confusion Matrix https://machinelearningmastery.com/confusion-matrix-machine-learning/ 混淆矩陣是一種總結分類演算法效能的技術。

MATLAB實現多分類預測混淆矩陣（Confusion matrix）視覺化

對於多分類問題，如何對預測結果進行視覺化是對比分析的關鍵一步，在實際多分類問題，除了簡單展示模型預測精度外，如何分別不同類別之間的預測結果對於分析樣本相關性和屬性區別具有重要意義，在MATLAB中一

datasets of sklearn

datasets sklearn提供了一些內建的小的玩具資料。也可以載入外部的一些資料。

Classification report of sklearn

Classification report The classification_report function builds a text report showing the main classification metrics. Here is a small example with custom target_names and inferred labels:

Transforming the prediction target of sklearn

concept https://scikit-learn.org/stable/modules/preprocessing_targets.html#preprocessing-targets 對於監督性學習，其目標值需要進行轉化，才能作為模型的目標，或者更加有效地適應模型。

multilabel of sklearn

multilabel https://scikit-learn.org/stable/modules/multiclass.html#multilabel-classification 多標記，對於一個樣本資料，多個可能的標籤。

Multiclass and multioutput overview of sklearn

Multiclass and multioutput algorithms https://scikit-learn.org/stable/modules/multiclass.html# sklearn 支援如下典型型別學習

statistical learning - supervised_learning of sklearn

統計學習 https://scikit-learn.org/stable/tutorial/statistical_inference/index.html 資料量不停增加，增加了機器學習的重要性。

statistical learning -- Unsupervised learning of sklearn

Unsupervised learning https://scikit-learn.org/stable/tutorial/statistical_inference/unsupervised_learning.html

Visualizing the stock market structure of sklearn

Visualizing the stock market structure https://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html#stock-market

Manifold learning of sklearn

Manifold learning https://scikit-learn.org/stable/modules/manifold.html#locally-linear-embedding 流形學習是一種非線性降維方法，演算法是基於一種想法，很多資料集的高緯度是人為製造的高，並不是真的高。

Column Transformer with Mixed Types -- of sklearn

Column Transformer with Mixed Types https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py

Column Transformer with Heterogeneous Data Sources -- of sklearn

Column Transformer with Heterogeneous Data Sources https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer.html#sphx-glr-auto-examples-compose-plot-column-transformer-py

Feature extraction of sklearn

Feature extraction https://scikit-learn.org/stable/modules/feature_extraction.html 從文字或圖片的資料集中提取出機器學習支援的資料格式。

Sample pipeline for text feature extraction and evaluation of sklearn

Sample pipeline for text feature extraction and evaluation https://scikit-learn.org/stable/auto_examples/model_selection/grid_search_text_feature_extraction.html#sphx-glr-auto-examples-model-selection

Clustering text documents using k-means of sklearn

Clustering text documents using k-means https://scikit-learn.org/stable/auto_examples/text/plot_document_clustering.html#sphx-glr-auto-examples-text-plot-document-clustering-py

機器學習之效能度量-實現混淆矩陣（confusion matrix）

技術標籤：機器學習python演算法機器學習 """ @author: JacksonKim @filename: confusion_matrix.py

[LeetCode 1284] Minimum Number of Flips to Convert Binary Matrix to Zero Matrix

Given am x nbinary matrixmat. In one step, you can choose one cell and flip it and all the four neighbours of itif they exist (Flip is changing 1 to 0 and 0 to 1). A pair of cells are called neighboor

562. Longest Line of Consecutive One in Matrix

package LeetCode_562 /** * 562. Longest Line of Consecutive One in Matrix * (Prime) *Given a 01 matrix M, find the longest line of consecutive one in the matrix.

LeetCode 1632. Rank Transform of a Matrix

題目拖了兩個月，終於這這道題目AC了。思路是貪心，將所有的元素從小到大排序。並且維護兩個陣列，一個數組代表每一行的當前已經填上的最大的rank，比如nrank[0]=2 表示第0行，目前已經填到了rank=2，下一個再填就

Confusion Matrix of sklearn

Confusion Matrix

confusion_matrix API

相關推薦