機器學習—迴歸與分類4-4(支援向量機演算法)
阿新 • • 發佈:2022-03-15
使用支援向量機預測黑色星期五花銷
主要步驟流程:
資料集連結:https://www.cnblogs.com/ojbtospark/p/16005660.html
1. 匯入包
In [ ]:# 匯入包
import numpy as np
import pandas as pd
2. 匯入資料集
In [ ]:# 匯入資料集
data = pd.read_csv('BlackFriday.csv')
data.head()
3. 資料預處理
3.1 檢測並處理缺失值
In [ ]:# 檢測缺失值
null_df = data.isnull().sum()
null_df
In [ ]:
# 刪除缺失列
data = data.drop(['Product_Category_2', 'Product_Category_3'], axis = 1)
data.head()
In [ ]:
# 再次檢測缺失值
null_df = data.isnull().sum()
null_df
3.2 刪除無用的列
In [ ]:# 刪除無用的列
data = data.drop(['User_ID', 'Product_ID'], axis = 1)
3.3 檢查類別型變數
In [ ]:# 檢查類別型變數
print(data.dtypes)
In [ ]:
# 轉換變數型別
data['Stay_In_Current_City_Years'].replace('4+', 4, inplace = True)
data['Stay_In_Current_City_Years'] = data['Stay_In_Current_City_Years '].astype('int64')
data['Product_Category_1'] = data['Product_Category_1'].astype('object')
data['Occupation'] = data['Occupation'].astype('object')
data['Marital_Status'] = data['Marital_Status'].astype('object')
In [ ]:
# 檢查類別型變數
print(data.dtypes)
3.4 標籤編碼&獨熱編碼
In [ ]:# 標籤編碼&獨熱編碼
data = pd.get_dummies(data, drop_first = True)
data.head()
3.5 得到自變數和因變數
In [ ]:# 得到自變數和因變數
y = data['Purchase'].values
data = data.drop(['Purchase'], axis = 1)
x = data.values
3.6 拆分訓練集和測試集
In [ ]:# 拆分訓練集和測試集
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 1)
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)
3.7 特徵縮放
In [ ]:# 特徵縮放
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)
sc_y = StandardScaler()
y_train = np.ravel(sc_y.fit_transform(y_train.reshape(-1, 1)))
4. 使用不同的引數構建支援向量機模型
4.1 模型1:構建支援向量機模型
4.1.1 構建模型
程式大約需要執行2分鐘
In [ ]:# 使用不同的引數構建支援向量機模型
# 模型1:構建支援向量機模型(kernel=rbf)
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf', gamma='scale', C=1.0, epsilon=0.1, verbose=True)
regressor.fit(x_train, y_train)
4.1.2 測試集做預測
In [ ]:# 在測試集做預測
y_pred = regressor.predict(x_test)
y_pred[:5]
In [ ]:
# y_pred變回特徵縮放之前的
y_pred = sc_y.inverse_transform(y_pred)
y_pred[:5]
4.1.3 評估模型效能
In [ ]:# 評估模型效能
from sklearn.metrics import r2_score
r2 = r2_score(y_test, y_pred)
print("R2 Score:", r2)
4.2 模型2:構建支援向量機模型
程式大約需要執行2分鐘
In [ ]:# 模型2:構建支援向量機模型(kernel=poly, degree=2)
regressor = SVR(kernel = 'poly', degree=2, gamma='scale', C=1.0, epsilon=0.1, verbose=True)
regressor.fit(x_train, y_train)
In [ ]:
# 在測試集做預測
y_pred = regressor.predict(x_test)
In [ ]:
# y_pred變回特徵縮放之前的
y_pred = sc_y.inverse_transform(y_pred)
In [ ]:
# 評估模型效能
r2 = r2_score(y_test, y_pred)
print("R2 Score:", r2)
4.3 模型3:構建支援向量機模型
程式大約需要執行2分鐘
In [ ]:# 模型3:構建支援向量機模型(kernel=poly, degree=3)
regressor = SVR(kernel = 'poly', degree=3, gamma='scale', C=1.0, epsilon=0.1, verbose=True)
regressor.fit(x_train, y_train)
In [ ]:
# 在測試集做預測
y_pred = regressor.predict(x_test)
In [ ]:
# y_pred變回特徵縮放之前的
y_pred = sc_y.inverse_transform(y_pred)
In [ ]:
# 評估模型效能
r2 = r2_score(y_test, y_pred)
print("R2 Score:", r2)
結論:
- 由上面3個模型可見,不同超引數對模型效能的影響不同。