Python機器學習——線性模型
最近斷斷續續地在接觸一些python的東西。按照我的習慣,首先從應用層面搞起,儘快入門,後續再細化一 些技術細節。找了一些資料,基本語法和資料結構搞定之後,目光便轉到了scikit-learn這個包。這個包是基於scipy的統計學習包。裡面所涵蓋 的演算法介面非常全面。更令人振奮的是,其使用者手冊寫得非常好。
1.廣義線性模型
這裡的“廣義線性模型”,是指線性模型及其簡單的推廣,包括嶺迴歸,lasso,LAR,logistic迴歸,感知器等等。下面將介紹這些模型的基本想法,以及如何用python實現。
1.1.普通的最小二乘
由LinearRegression
函式實現。最小二乘法的缺點是依賴於自變數的相關性,當出現復共線性時,設計陣會接近奇異,因此由最小二乘方法得到的結果就非常敏感,如果隨機誤差出現什麼波動,最小二乘估計也可能出現較大的變化。而當資料是由非設計的試驗獲得的時候,復共線性出現的可能性非常大。
from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit ([[0,0],[1,1],[2,2]],[0,1,2]) #擬合
clf.coef_ #獲取擬合引數
上面程式碼給出了實現線性迴歸擬合以及獲得擬合引數的兩個主要函式。下面的指令碼則給出了一個較為完整的例項。
指令碼: print __doc__ import pylab as pl import numpy as np from sklearn import datasets, linear_model diabetes = datasets.load_diabetes() #載入資料 diabetes_x = diabetes.data[:, np.newaxis] diabetes_x_temp = diabetes_x[:, :, 2] diabetes_x_train = diabetes_x_temp[:-20] #訓練樣本 diabetes_x_test = diabetes_x_temp[-20:] #檢測樣本 diabetes_y_train = diabetes.target[:-20] diabetes_y_test = diabetes.target[-20:] regr = linear_model.LinearRegression() regr.fit(diabetes_x_train, diabetes_y_train) print 'Coefficients :n', regr.coef_ print ("Residual sum of square: %.2f" %np.mean((regr.predict(diabetes_x_test) - diabetes_y_test) ** 2)) print ("variance score: %.2f" % regr.score(diabetes_x_test, diabetes_y_test)) pl.scatter(diabetes_x_test,diabetes_y_test, color = 'black') pl.plot(diabetes_x_test, regr.predict(diabetes_x_test),color='blue',linewidth = 3) pl.xticks(()) pl.yticks(()) pl.show()
1.2.嶺迴歸
嶺迴歸是一種正則化方法,通過在損失函式中加入L2範數懲罰項,來控制線性模型的複雜程度,從而使得模型更穩健。嶺迴歸的函式如下:
from sklearn import linear_model
clf = linear_model.Ridge (alpha = .5)
clf.fit([[0,0],[0,0],[1,1]],[0,.1,1])
clf.coef_
下面的指令碼提供了繪製嶺估計係數與懲罰係數之間關係的功能
print __doc__ import numpy as np import pylab as pl from sklearn import linear_model # X is the 10x10 Hilbert matrix X = 1. / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis]) y = np.ones(10) # Compute paths n_alphas = 200 alphas = np.logspace(-10, -2, n_alphas) clf = linear_model.Ridge(fit_intercept=False) #建立一個嶺迴歸物件 coefs = []#迴圈,對每一個alpha,做一次擬合 for a in alphas: clf.set_params(alpha=a) clf.fit(X, y) coefs.append(clf.coef_)#係數儲存在coefs中,append # Display results ax = pl.gca() ax.set_color_cycle(['b', 'r', 'g', 'c', 'k', 'y', 'm']) ax.plot(alphas, coefs) ax.set_xscale('log') #注意這一步,alpha是對數化了的 ax.set_xlim(ax.get_xlim()[::-1]) # reverse axis pl.xlabel('alpha') pl.ylabel('weights') pl.title('Ridge coefficients as a function of the regularization') pl.axis('tight') pl.show()
使用GCV來設定正則化係數的程式碼如下:
clf = linear_model.RidgeCV(alpha = [0.1, 1.0, 10.0])
clf.fit([[0,0],[0,0],[1,1]],[0,.1,1])
clf.alpha_
1.3. Lasso
lasso和嶺估計的區別在於它的懲罰項是基於L1範數的。因此,它可以將係數控制收縮到0,從而達到變數選擇的效果。它是一種非常流行的變數選擇 方法。Lasso估計的演算法主要有兩種,其一是用於以下介紹的函式Lasso的coordinate descent。另外一種則是下面會介紹到的最小角迴歸(筆者學生階段讀過的最令人佩服的文章之一便是Efron的這篇LARS,完全醍醐灌頂,建議所有 人都去讀一讀)。
clf = linear_model.Lasso(alpha = 0.1)
clf.fit([[0,0],[1,1]],[0,1])
clf.predict([[1,1]])
下面給出一個指令碼,比較了Lasso和Elastic Net在處理稀疏訊號中的應用
print __doc__
import numpy as np
import pylab as pl
from sklearn.metrics import r2_score
# generate some sparse data to play with
np.random.seed(42)
n_samples, n_features = 50, 200
X = np.random.randn(n_samples, n_features)
coef = 3 * np.random.randn(n_features)
inds = np.arange(n_features)
np.random.shuffle(inds)#打亂觀測順序
coef[inds[10:]] = 0 # sparsify coef
y = np.dot(X, coef)
# add noise
y += 0.01 * np.random.normal((n_samples,))
# Split data in train set and test set
n_samples = X.shape[0]
X_train, y_train = X[:n_samples / 2], y[:n_samples / 2]
X_test, y_test = X[n_samples / 2:], y[n_samples / 2:]
# Lasso
from sklearn.linear_model import Lasso
alpha = 0.1
lasso = Lasso(alpha=alpha)#Lasso物件
y_pred_lasso = lasso.fit(X_train, y_train).predict(X_test)#擬合併預測
r2_score_lasso = r2_score(y_test, y_pred_lasso)
print lasso
print "r^2 on test data : %f" % r2_score_lasso
# ElasticNet
from sklearn.linear_model import ElasticNet
enet = ElasticNet(alpha=alpha, l1_ratio=0.7)
y_pred_enet = enet.fit(X_train, y_train).predict(X_test)
r2_score_enet = r2_score(y_test, y_pred_enet)
print enet
print "r^2 on test data : %f" % r2_score_enet
pl.plot(enet.coef_, label='Elastic net coefficients')
pl.plot(lasso.coef_, label='Lasso coefficients')
pl.plot(coef, '--', label='original coefficients')
pl.legend(loc='best')
pl.title("Lasso R^2: %f, Elastic Net R^2: %f"
% (r2_score_lasso, r2_score_enet))
pl.show()
1.3.1.如何設定正則化係數
1.3.1.1. 使用交叉驗證
有兩個函式實現了交叉驗證,LassoCV
,另一個是LassoLarsCV
。對於高維資料而言,選擇LassoCV
往往是最適合的方法,然而LassoLarsCV
速度要快一些。
1.3.1.2. 使用資訊準則
AIC,BIC。這些準則計算起來比cross validation方法消耗低。然而使用這些準則的前提是我們對模型的自由度有一個恰當的估計,並且假設我們的概率模型是正確的。事實上我們也經常遇到 這種問題,我們還是更希望能直接從資料中算出些什麼,而不是首先建立概率模型的假設。
下面的這個指令碼比較了幾種設定正則化引數的方法:AIC,BIC以及cross-validation
import time
import numpy as np
import pylab as pl
from sklearn.linear_model import LassoCV, LassoLarsCV, LassoLarsIC
from sklearn import datasets
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target
rng = np.random.RandomState(42)
X = np.c_[X, rng.randn(X.shape[0], 14)] # add some bad features
# normalize data as done by Lars to allow for comparison
X /= np.sqrt(np.sum(X ** 2, axis=0))
# LassoLarsIC: least angle regression with BIC/AIC criterion
model_bic = LassoLarsIC(criterion='bic')#BIC準則
t1 = time.time()
model_bic.fit(X, y)
t_bic = time.time() - t1
alpha_bic_ = model_bic.alpha_
model_aic = LassoLarsIC(criterion='aic')#AIC準則
model_aic.fit(X, y)
alpha_aic_ = model_aic.alpha_
def plot_ic_criterion(model, name, color):
alpha_ = model.alpha_
alphas_ = model.alphas_
criterion_ = model.criterion_
pl.plot(-np.log10(alphas_), criterion_, '--', color=color,
linewidth=3, label='%s criterion' % name)
pl.axvline(-np.log10(alpha_), color=color, linewidth=3,
label='alpha: %s estimate' % name)
pl.xlabel('-log(alpha)')
pl.ylabel('criterion')
pl.figure()
plot_ic_criterion(model_aic, 'AIC', 'b')
plot_ic_criterion(model_bic, 'BIC', 'r')
pl.legend()
pl.title('Information-criterion for model selection (training time %.3fs)'
% t_bic)
# LassoCV: coordinate descent
# Compute paths
print "Computing regularization path using the coordinate descent lasso..."
t1 = time.time()
model = LassoCV(cv=20).fit(X, y)#建立對像,並擬合
t_lasso_cv = time.time() - t1
# Display results
m_log_alphas = -np.log10(model.alphas_)
pl.figure()
ymin, ymax = 2300, 3800
pl.plot(m_log_alphas, model.mse_path_, ':')
pl.plot(m_log_alphas, model.mse_path_.mean(axis=-1), 'k',
label='Average across the folds', linewidth=2)
pl.axvline(-np.log10(model.alpha_), linestyle='--', color='k',
label='alpha: CV estimate')
pl.legend()
pl.xlabel('-log(alpha)')
pl.ylabel('Mean square error')
pl.title('Mean square error on each fold: coordinate descent '
'(train time: %.2fs)' % t_lasso_cv)
pl.axis('tight')
pl.ylim(ymin, ymax)
# LassoLarsCV: least angle regression
# Compute paths
print "Computing regularization path using the Lars lasso..."
t1 = time.time()
model = LassoLarsCV(cv=20).fit(X, y)
t_lasso_lars_cv = time.time() - t1
# Display results
m_log_alphas = -np.log10(model.cv_alphas_)
pl.figure()
pl.plot(m_log_alphas, model.cv_mse_path_, ':')
pl.plot(m_log_alphas, model.cv_mse_path_.mean(axis=-1), 'k',
label='Average across the folds', linewidth=2)
pl.axvline(-np.log10(model.alpha_), linestyle='--', color='k',
label='alpha CV')
pl.legend()
pl.xlabel('-log(alpha)')
pl.ylabel('Mean square error')
pl.title('Mean square error on each fold: Lars (train time: %.2fs)'
% t_lasso_lars_cv)
pl.axis('tight')
pl.ylim(ymin, ymax)
pl.show()
1.4. Elastic Net
ElasticNet是對Lasso和嶺迴歸的融合,其懲罰項是L1範數和L2範數的一個權衡。下面的指令碼比較了Lasso和Elastic Net的迴歸路徑,並做出了其圖形。
print __doc__
# Author: Alexandre Gramfort# License: BSD Style.
import numpy as np
import pylab as pl
from sklearn.linear_model import lasso_path, enet_path
from sklearn import datasets
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target
X /= X.std(0) # Standardize data (easier to set the l1_ratio parameter)
# Compute paths
eps = 5e-3 # the smaller it is the longer is the path
print "Computing regularization path using the lasso..."
models = lasso_path(X, y, eps=eps)
alphas_lasso = np.array([model.alpha for model in models])
coefs_lasso = np.array([model.coef_ for model in models])
print "Computing regularization path using the positive lasso..."
models = lasso_path(X, y, eps=eps, positive=True)#lasso path
alphas_positive_lasso = np.array([model.alpha for model in models])
coefs_positive_lasso = np.array([model.coef_ for model in models])
print "Computing regularization path using the elastic net..."
models = enet_path(X, y, eps=eps, l1_ratio=0.8)
alphas_enet = np.array([model.alpha for model in models])
coefs_enet = np.array([model.coef_ for model in models])
print "Computing regularization path using the positve elastic net..."
models = enet_path(X, y, eps=eps, l1_ratio=0.8, positive=True)
alphas_positive_enet = np.array([model.alpha for model in models])
coefs_positive_enet = np.array([model.coef_ for model in models])
# Display results
pl.figure(1)
ax = pl.gca()
ax.set_color_cycle(2 * ['b', 'r', 'g', 'c', 'k'])
l1 = pl.plot(coefs_lasso)
l2 = pl.plot(coefs_enet, linestyle='--')
pl.xlabel('-Log(lambda)')
pl.ylabel('weights')
pl.title('Lasso and Elastic-Net Paths')
pl.legend((l1[-1], l2[-1]), ('Lasso', 'Elastic-Net'), loc='lower left')
pl.axis('tight')
pl.figure(2)
ax = pl.gca()
ax.set_color_cycle(2 * ['b', 'r', 'g', 'c', 'k'])
l1 = pl.plot(coefs_lasso)
l2 = pl.plot(coefs_positive_lasso, linestyle='--')
pl.xlabel('-Log(lambda)')
pl.ylabel('weights')
pl.title('Lasso and positive Lasso')
pl.legend((l1[-1], l2[-1]), ('Lasso', 'positive Lasso'), loc='lower left')
pl.axis('tight')
pl.figure(3)
ax = pl.gca()
ax.set_color_cycle(2 * ['b', 'r', 'g', 'c', 'k'])
l1 = pl.plot(coefs_enet)
l2 = pl.plot(coefs_positive_enet, linestyle='--')
pl.xlabel('-Log(lambda)')
pl.ylabel('weights')
pl.title('Elastic-Net and positive Elastic-Net')
pl.legend((l1[-1], l2[-1]), ('Elastic-Net', 'positive Elastic-Net'),
loc='lower left')
pl.axis('tight')
pl.show()
1.5. 多工Lasso
多工Lasso其實就是多元Lasso。Lasso在多元迴歸中的推廣的tricky在於如何設定懲罰項。這裡略.
1.6. 最小角迴歸
LassoLars
給出了LARS演算法求解Lasso的介面。下面的指令碼給出瞭如何用LARS畫出Lasso的path。
print __doc__
# Author: Fabian Pedregosa# Alexandre Gramfort# License: BSD Style.
import numpy as np
import pylab as pl
from sklearn import linear_model
from sklearn import datasets
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target
print "Computing regularization path using the LARS ..."
alphas, _, coefs = linear_model.lars_path(X, y, method='lasso', verbose=True)#lars演算法的求解路徑
xx = np.sum(np.abs(coefs.T), axis=1)
xx /= xx[-1]
pl.plot(xx, coefs.T)
ymin, ymax = pl.ylim()
pl.vlines(xx, ymin, ymax, linestyle='dashed')
pl.xlabel('|coef| / max|coef|')
pl.ylabel('Coefficients')
pl.title('LASSO Path')
pl.axis('tight')
pl.show()
logistic 迴歸
Logistic迴歸是一個線性分類器。類LogisticRegression
實現了該分類器,並且實現了L1範數,L2範數懲罰項的logistic迴歸。下面的指令碼是一個例子,將8*8畫素的數字影象分成了兩類,其中0-4分為一類,5-9分為一類。比較了L1,L2範數懲罰項,在不同的C值的情況。
指令碼如下:
print __doc__
# Authors: Alexandre Gramfort
# Mathieu Blondel
# Andreas Mueller
# License: BSD Style.
import numpy as np
import pylab as pl
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
digits = datasets.load_digits()
X, y = digits.data, digits.target
X = StandardScaler().fit_transform(X)
# classify small against large digits
y = (y > 4).astype(np.int)
# Set regularization parameter
for i, C in enumerate(10. ** np.arange(1, 4)):
# turn down tolerance for short training time
clf_l1_LR = LogisticRegression(C=C, penalty='l1', tol=0.01)
clf_l2_LR = LogisticRegression(C=C, penalty='l2', tol=0.01)
clf_l1_LR.fit(X, y)
clf_l2_LR.fit(X, y)
coef_l1_LR = clf_l1_LR.coef_.ravel()
coef_l2_LR = clf_l2_LR.coef_.ravel()
# coef_l1_LR contains zeros due to the
# L1 sparsity inducing norm
sparsity_l1_LR = np.mean(coef_l1_LR == 0) * 100
sparsity_l2_LR = np.mean(coef_l2_LR == 0) * 100
print "C=%d" % C
print "Sparsity with L1 penalty: %.2f%%" % sparsity_l1_LR
print "score with L1 penalty: %.4f" % clf_l1_LR.score(X, y)
print "Sparsity with L2 penalty: %.2f%%" % sparsity_l2_LR
print "score with L2 penalty: %.4f" % clf_l2_LR.score(X, y)
l1_plot = pl.subplot(3, 2, 2 * i + 1)
l2_plot = pl.subplot(3, 2, 2 * (i + 1))
if i == 0:
l1_plot.set_title("L1 penalty")
l2_plot.set_title("L2 penalty")
l1_plot.imshow(np.abs(coef_l1_LR.reshape(8, 8)), interpolation='nearest',
cmap='binary', vmax=1, vmin=0)
l2_plot.imshow(np.abs(coef_l2_LR.reshape(8, 8)), interpolation='nearest',
cmap='binary', vmax=1, vmin=0)
pl.text(-8, 3, "C = %d" % C)
l1_plot.set_xticks(())
l1_plot.set_yticks(())
l2_plot.set_xticks(())
l2_plot.set_yticks(())
pl.show()