sklearn.svm.SVC 支援向量機引數詳解

阿新 • • 發佈：2020-08-25

用法如下：

class sklearn.svm.SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', break_ties=False, random_state=None)

可選引數

C：正則化引數。正則化的強度與C成反比。必須嚴格為正。懲罰是平方的l2懲罰。(預設1.0)，懲罰引數越小，容忍性就越大
kernel：核函式型別，可選‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’；
degree：當選擇核函式為poly多項式時，表示多項式的階數
gamma：可選‘scale’和‘auto’，表示為“ rbf”，“ poly”和“ Sigmoid”的核心係數。預設是'scale',gamma取值為1 / (n_features * X.var())；當選‘auto’引數時gamma取值為1 / n_features。
coef0：當核函式選為“ poly”和“ sigmoid”有意義。
shrinking：是否使用縮小的啟發式方法,預設是True。
probability：是否啟用概率估計,預設是False。必須在呼叫fit之前啟用此功能，因為該方法內部使用5倍交叉驗證，因而會減慢該方法的速度，並且predict_proba可能與dict不一致。
tol：演算法停止的條件，預設為0.001。cache_size：指定核心快取的大小（以MB為單位），預設是200。
class_weight：每個類樣本的權重，可以用字典形式給出，選擇'balanced',權重為n_samples / (n_classes * np.bincount(y))；預設是None，表示每個樣本權重一致。
verbose：是否使用詳細輸出，預設是False。
max_iter：演算法迭代的最大步數，預設-1表示無限制
decision_function_shape：多分類的形式，1 vs 多(‘ovo’)還是1 vs 1(’ovr’)，預設’ovr’.
break_ties：如果為true，decision_function_shape ='ovr'，並且類別數> 2，則預測將根據Decision_function的置信度值打破平局；否則，將返回繫結類中的第一類。請注意，與簡單預測相比，打破平局的計算成本較高。
random_state：隨機種子，隨機打亂樣本。

可選標籤

support_：
support_vectors_：支援向量
n_support_：每個類的支援向量數量
dual_coef_：對偶係數；
coef_：原始問題的係數
intercept_：決策函式中的常數
fit_status_：如果正確擬合，則為0，否則為1（將發出警告）
classes_：類別
class_weight_：類別的權重
shape_fit_：訓練向量X的陣列尺寸。

資料準備：

# 引入資料
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()
X = iris.data[:,[2,3]]
y = iris.target
print("Class labels:",np.unique(y))  #列印分類類別的種類


# 切分訓練資料和測試資料
from sklearn.model_selection import train_test_split
## 30%測試資料，70%訓練資料，stratify=y表示訓練資料和測試資料具有相同的類別比例
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=1,stratify=y)



from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
## 估算訓練資料中的mu和sigma
sc.fit(X_train)
## 使用訓練資料中的mu和sigma對資料進行標準化
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)



## 畫出決策邊界圖(只有在2個特徵才能畫出來)
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib.colors import ListedColormap

def plot_decision_region(X,y,classifier,resolution=0.02):
    markers = ('s','x','o','^','v')
    colors = ('red','blue','lightgreen','gray','cyan')
    cmap = ListedColormap(colors[:len(np.unique(y))])

    # plot the decision surface
    x1_min,x1_max = X[:,0].min()-1,X[:,0].max()+1
    x2_min,x2_max = X[:,1].min()-1,X[:,1].max()+1
    xx1,xx2 = np.meshgrid(np.arange(x1_min,x1_max,resolution),
                         np.arange(x2_min,x2_max,resolution))
    Z = classifier.predict(np.array([xx1.ravel(),xx2.ravel()]).T)
    Z = Z.reshape(xx1.shape)
    plt.contourf(xx1,xx2,Z,alpha=0.3,cmap=cmap)
    plt.xlim(xx1.min(),xx1.max())
    plt.ylim(xx2.min(),xx2.max())

    # plot class samples
    for idx,cl in enumerate(np.unique(y)):
        plt.scatter(x=X[y==cl,0],
                   y = X[y==cl,1],
                   alpha=0.8,
                   c=colors[idx],
                   marker = markers[idx],
                   label=cl,
                   edgecolors='black')

線性支援向量機：

## 線性支援向量機
from sklearn.svm import SVC
svm = SVC(kernel='linear',C=1.0,random_state=1)
svm.fit(X_train_std,y_train)
plot_decision_region(X_train_std,y_train,classifier=svm,resolution=0.02)
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.show()

使用核函式對非線性分類問題建模(gamma=0.20)

## 使用核函式對非線性分類問題建模(gamma=0.20)
svm = SVC(kernel='rbf',random_state=1,gamma=0.20,C=1.0)    ##較小的gamma有較鬆的決策邊界
svm.fit(X_train_std,y_train)
plot_decision_region(X_train_std,y_train,classifier=svm,resolution=0.02)
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.show()

使用核函式對非線性分類問題建模(gamma=100)

## 使用核函式對非線性分類問題建模(gamma=100)
svm = SVC(kernel='rbf',random_state=1,gamma=100.0,C=1.0,verbose=1)   
svm.fit(X_train_std,y_train)
plot_decision_region(X_train_std,y_train,classifier=svm,resolution=0.02)
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.show()

從不同的gamma取值的影象來看：對於高斯核函式，增大gamma值，將增大訓練樣本的影響範圍，導致決策邊界緊縮和波動；較小的gamma值得到的決策邊界相對寬鬆。雖然較大的gamma值在訓練樣本中有很小的訓練誤差，但是很可能泛化能力較差，容易出現過擬合

全部程式碼（已摺疊）

# -*- coding: utf-8 -*-
"""
Created on Tue Aug 11 10:12:48 2020

@author: Admin
"""


# 引入資料
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()
X = iris.data[:,[2,3]]
y = iris.target
print("Class labels:",np.unique(y))  #列印分類類別的種類


# 切分訓練資料和測試資料
from sklearn.model_selection import train_test_split
## 30%測試資料，70%訓練資料，stratify=y表示訓練資料和測試資料具有相同的類別比例
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=1,stratify=y)



from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
## 估算訓練資料中的mu和sigma
sc.fit(X_train)
## 使用訓練資料中的mu和sigma對資料進行標準化
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)



## 畫出決策邊界圖(只有在2個特徵才能畫出來)
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib.colors import ListedColormap

def plot_decision_region(X,y,classifier,resolution=0.02):
    markers = ('s','x','o','^','v')
    colors = ('red','blue','lightgreen','gray','cyan')
    cmap = ListedColormap(colors[:len(np.unique(y))])

    # plot the decision surface
    x1_min,x1_max = X[:,0].min()-1,X[:,0].max()+1
    x2_min,x2_max = X[:,1].min()-1,X[:,1].max()+1
    xx1,xx2 = np.meshgrid(np.arange(x1_min,x1_max,resolution),
                         np.arange(x2_min,x2_max,resolution))
    Z = classifier.predict(np.array([xx1.ravel(),xx2.ravel()]).T)
    Z = Z.reshape(xx1.shape)
    plt.contourf(xx1,xx2,Z,alpha=0.3,cmap=cmap)
    plt.xlim(xx1.min(),xx1.max())
    plt.ylim(xx2.min(),xx2.max())

    # plot class samples
    for idx,cl in enumerate(np.unique(y)):
        plt.scatter(x=X[y==cl,0],
                   y = X[y==cl,1],
                   alpha=0.8,
                   c=colors[idx],
                   marker = markers[idx],
                   label=cl,
                   edgecolors='black')
        

## 線性支援向量機
from sklearn.svm import SVC
svm = SVC(kernel='linear',C=1.0,random_state=1)
svm.fit(X_train_std,y_train)
plot_decision_region(X_train_std,y_train,classifier=svm,resolution=0.02)
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.show()
        


## 使用核函式對非線性分類問題建模(gamma=0.20)
svm = SVC(kernel='rbf',random_state=1,gamma=0.20,C=1.0)    ##較小的gamma有較鬆的決策邊界
svm.fit(X_train_std,y_train)
plot_decision_region(X_train_std,y_train,classifier=svm,resolution=0.02)
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.show()



## 使用核函式對非線性分類問題建模(gamma=100)
svm = SVC(kernel='rbf',random_state=1,gamma=100.0,C=1.0,verbose=1)   
svm.fit(X_train_std,y_train)
plot_decision_region(X_train_std,y_train,classifier=svm,resolution=0.02)
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.show()

sklearn.svm.SVC 支援向量機引數詳解

用法如下： class sklearn.svm.SVC(*, C=1.0, kernel=\'rbf\', degree=3, gamma=\'scale\', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_it

圖解機器學習 | 支援向量機模型詳解

sklearn-SVC實現與類引數詳解

sklearn-SVC實現與類引數對應的API：http://scikit-learn.sourceforge.net/stable/modules/generated/sklearn.svm.SVC.html

sklearn.ensemble.RandomForestClassifier 隨機深林引數詳解

隨機森林是一種元估計量，它適合資料集各個子樣本上的許多決策樹分類器，並使用平均數來提高預測準確性和控制過度擬合。子樣本大小由max_samples引數bootstrap=True (default)控制，否則整個資料集用於構建每棵樹

mooc機器學習第七天-分類支援向量機svm.svc

1.函式簡介 sklearn.svm.SVC(C=1.0,kernel=\'rbf\',degree=3,gamma=\'auto\',coef0=0.0,shrinking=True,probability=False,

詳解python 支援向量機(SVM)演算法

相比於邏輯迴歸，在很多情況下，SVM演算法能夠對資料計算從而產生更好的精度。而傳統的SVM只能適用於二分類操作，不過卻可以通過核技巧（核函式），使得SVM可以應用於多分類的任務中。

拓端tecdat：R語言梯度提升機 GBM、支援向量機SVM、正則判別分析RDA模型訓練、引數調優化和效能比較視覺化分析聲納資料

原文連結：http://tecdat.cn/?p=24354 原文出處：拓端資料部落公眾號本文介紹簡化模型構建和評估過程。

支援向量機（SVM）入門詳解（續）與python實現

接前文支援向量機SVM入門詳解：那些你需要消化的知識讓我再一次比較完整的重複一下我們要解決的問題：我們有屬於兩個類別的樣本點（並不限定這些點在二維空間中）若干，如圖，

SVM支援向量機（2）核函式相關及如何選擇

0x00 簡介本文仍是觀看吳恩達老師的SVM相關課程後的筆記和梳理。上一篇我們介紹了SVM的工作原理，這一篇學習核函式的使用場景、原理和使用的注意事項，以及一些引數選擇、模型選擇的問題。

C++版OpenCV使用支援向量機svm進行mnist手寫數字識別

技術標籤：OPenCVC++OpenCVsvmmnist手寫數字識別支援向量機svm也是一種機器學習演算法，採用空間超平面進行資料分割，在這篇部落格中我們將使用svm進行手寫數字的識別，使用該演算法，識別率可以達到100%。環境

機器學習筆記之python實現支援向量機SVM演算法樣例

0x00 概述相比於邏輯迴歸，在很多情況下，SVM演算法能夠對資料計算從而產生更好的精度。而傳統的SVM只能適用於二分類操作，不過卻可以通過核技巧（核函式），使得SVM可以應用於多分類的任務中。

機器學習之監督學習--（分類）支援向量機SVM①

技術標籤：機器學習python支援向量機分類演算法人工智慧 SVM簡單例子 from sklearn import svm

【情感識別】基於matlab支援向量機（SVM）語音情感識別【含Matlab原始碼 543期】

一、簡介支援向量機(Support Vector Machine)是Cortes和Vapnik於1995年首先提出的，它在解決小樣本、非線性及高維模式識別中表現出許多特有的優勢，並能夠推廣應用到函式擬合等其他機器學習問題中。

拓端tecdat|R語言進行支援向量機迴歸SVR和網格搜尋超引數優化

原文連結：http://tecdat.cn/?p=23305 原文出處：拓端資料部落公眾號在這篇文章中，我將展示如何使用R語言來進行支援向量迴歸SVR。

統計學習：線性可分支援向量機(SVM)

個人複習統計學習中線性可分支援向量機(SVM)所做的筆記，重點關注從統計學和凸優化的角度給出嚴謹的公式推導，包括模型、學習策略和演算法三個部分。參考資料包括李航的《統計學習方法》和Stephen Boyd的《凸優化》

統計學習：線性支援向量機(SVM)

上一章我們所定義的“線性可分支援向量機”要求訓練資料是線性可分的。然而在實際中，訓練資料往往包括異常值(outlier)，故而常是線性不可分的。這就要求我們要對上一章的演算法做出一定的修改，即放寬條件，將原

【機器學習】重新認識支援向量機（SVM）

注：對於一位剛剛站在科學研究大門口的博士一年級學生，思考科研的方法以及科研與其它工作之間的異同由來已久。科研的道路上充滿了不確定性，從大方向到每一步的細節都需要探索。如果我們放任這種不確定性不予理睬，

支援向量機SVM

邏輯迴歸的代價函式每個單獨的訓練樣本一起為邏輯迴歸總體目標做貢獻兩條線段

python機器學習——SVM支援向量機

背景與原理：支援向量機是一種用來解決分類問題的演算法，其原理大致可理解為：對於所有$n$維的資料點，我們希望能夠找到一個$n$維的直線（平面，超平面），使得在這個超平面一側的點屬於同一類，另一側的點屬於另一

基於支援向量機的手寫數字識別詳解（MATLAB GUI程式碼，提供手寫板）

摘要：本文詳細介紹如何利用MATLAB實現手寫數字的識別，其中特徵提取過程採用方向梯度直方圖（HOG）特徵，分類過程採用效能優異的支援向量機（SVM）演算法，訓練測試資料集為學術及工程上常用的MNIST手寫數字資料

sklearn.svm.SVC 支援向量機引數詳解

相關推薦