基於sciket-learn實現線性迴歸演算法
阿新 • • 發佈:2018-11-22
線性迴歸演算法主要用來解決迴歸問題,是許多強大的非線性模型的基礎,無論是簡單線性迴歸,還是多元線性迴歸,思想都是一樣的,假設我們找到了最佳擬合方程(對於簡單線性迴歸,多元線性迴歸對應多個特徵作為一組向量)y=ax+b,則對於每一個樣本點xi,根據我們的直線方程,預測值為y^i = axi + b,真值為y,我們希望y和y^i的差距儘量的小。
接下來我們看看通過sciket-learn來實現線性迴歸演算法,首先還是匯入常用的庫
import numpy as np import matplotlib.pyplot as plt from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression
這裡我們使用boston房價的資料集,並且去掉50這個極值,通常在實際應用中,這個極值可能是由於環境因素或者儀器限制等無法獲取到真值,所以在這裡我們去除資料集裡的50
boston = datasets.load_boston()
X = boston.data
y = boston.target
X = X[y < 50.0]
y = y[y < 50.0]
接下里是訓練集和測試集的劃分以及獲取構造器並且fit訓練集
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666) lin_reg = LinearRegression() lin_reg.fit(X_train, y_train)
在檢視分類效果前,可以先看看y^方程裡的係數與截距
lin_reg.coef_
lin_reg.intercept_
最後我們來檢視線性迴歸的score和predict值
lin_reg.score(X_test, y_test)
lin_reg.predict(X_test)
這樣一個多遠線性迴歸的演算法變完成了,在我的機器上,評價結果是0.80089168995191,大家的出來值應該差不多也是這個維度,在上一篇部落格中提到的kNN演算法,我們用它來解決了分類問題,kNN同樣也可以用來解決迴歸問題,我們在同樣的資料集下,看看kNN的表現如何。
首先,還是先倒入相關類庫(接著上面的程式碼,重複的類庫就不再重新匯入了)
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import GridSearchCV
訓練並檢視結果
knn_reg = KNeighborsRegressor()
knn_reg.fit(X_train, y_train)
knn_reg.score(X_test, y_test)
在我的機器上,得出的評價結果是0.60,與線性迴歸差的有點多,但是在這裡我們或許並沒有使用最優的超引數,下面進行網格化搜尋
param_grid = [
{
"weights" : ["uniform"],
"n_neighbors" : [i for i in range(1,11)]
},
{
"weights" : ["distance"],
"n_neighbors" : [i for i in range(1,11)],
"p" : [i for i in range(1,6)]
}
]
knn_reg = KNeighborsRegressor()
grid_search = GridSearchCV(knn_reg, param_grid, n_jobs = -1, verbose = 1)
grid_search.fit(X_train, y_train)
檢視最優超引數
grid_search.best_params_
檢視評價結果
grid_search.best_estimator_.score(X_test, y_test)
我的機器上得到的評價結果是0.73,雖然比線性迴歸還差一些,但是已經在同一個維度上了。下面是完整程式碼
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
boston = datasets.load_boston()
X = boston.data
y = boston.target
X = X[y < 50.0]
y = y[y < 50.0]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=666)
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
lin_reg.coef_
lin_reg.intercept_
lin_reg.score(X_test, y_test)
from sklearn.neighbors import KNeighborsRegressor
knn_reg = KNeighborsRegressor()
knn_reg.fit(X_train, y_train)
knn_reg.score(X_test, y_test)
from sklearn.model_selection import GridSearchCV
param_grid = [
{
"weights" : ["uniform"],
"n_neighbors" : [i for i in range(1,11)]
},
{
"weights" : ["distance"],
"n_neighbors" : [i for i in range(1,11)],
"p" : [i for i in range(1,6)]
}
]
knn_reg = KNeighborsRegressor()
grid_search = GridSearchCV(knn_reg, param_grid, n_jobs = -1, verbose = 1)
grid_search.fit(X_train, y_train)
grid_search.best_params_
grid_search.best_estimator_.score(X_test, y_test)