Confusion Matrix of sklearn
Confusion Matrix
https://machinelearningmastery.com/confusion-matrix-machine-learning/
混淆矩陣是一種總結分類演算法效能的技術。
如果單獨看正確率,很有可能掩蓋了一些小資料量的類的情況。換句話說分類的資料是不均衡的,相對於每類,有的類資料多,有的類資料少。
同時通過這個工具,你還可以明顯地看出, 那些分類結果是好的,那些錯誤容易犯。
A confusion matrix is a technique for summarizing the performance of a classification algorithm.
Classification accuracy alone can be misleading if you have an unequal number of observations in each class or if you have more than two classes in your dataset.
Calculating a confusion matrix can give you a better idea of what your classification model is getting right and what types of errors it is making.
confusion_matrix API
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix
混淆矩陣每一行, 都應一個類。
每一列對應預測的一個分類。
每一個元素,就是將一個類判斷為另外一個類的概率。
>>> from sklearn.metrics import confusion_matrix >>> y_true = [2, 0, 2, 2, 0, 1]>>> y_pred = [0, 0, 2, 2, 0, 2] >>> confusion_matrix(y_true, y_pred) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]])
plot_confusion_matrix API
https://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix
混淆矩陣圖,更加形象。
使用正規化的選項, 可以讓各個類,在百分比上進行橫向比較。
計數模式,可以幫助我們看到,那些類是資料量小, 哪些類資料量大。
plot_confusion_matrix
can be used to visually represent a confusion matrix as shown in the Confusion matrix example, which creates the following figure:
https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
Example of confusion matrix usage to evaluate the quality of the output of a classifier on the iris data set. The diagonal elements represent the number of points for which the predicted label is equal to the true label, while off-diagonal elements are those that are mislabeled by the classifier. The higher the diagonal values of the confusion matrix the better, indicating many correct predictions.
The figures show the confusion matrix with and without normalization by class support size (number of elements in each class). This kind of normalization can be interesting in case of class imbalance to have a more visual interpretation of which class is being misclassified.
Here the results are not as good as they could be as our choice for the regularization parameter C was not the best. In real life applications this parameter is usually chosen using Tuning the hyper-parameters of an estimator.
Out:
Confusion matrix, without normalization [[13 0 0] [ 0 10 6] [ 0 0 9]] Normalized confusion matrix [[1. 0. 0. ] [0. 0.62 0.38] [0. 0. 1. ]]
print(__doc__) import numpy as np import matplotlib.pyplot as plt from sklearn import svm, datasets from sklearn.model_selection import train_test_split from sklearn.metrics import plot_confusion_matrix # import some data to play with iris = datasets.load_iris() X = iris.data y = iris.target class_names = iris.target_names # Split the data into a training set and a test set X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) # Run classifier, using a model that is too regularized (C too low) to see # the impact on the results classifier = svm.SVC(kernel='linear', C=0.01).fit(X_train, y_train) np.set_printoptions(precision=2) # Plot non-normalized confusion matrix titles_options = [("Confusion matrix, without normalization", None), ("Normalized confusion matrix", 'true')] for title, normalize in titles_options: disp = plot_confusion_matrix(classifier, X_test, y_test, display_labels=class_names, cmap=plt.cm.Blues, normalize=normalize) disp.ax_.set_title(title) print(title) print(disp.confusion_matrix) plt.show()