R語言-嶺迴歸及lasso演算法
阿新 • • 發佈:2020-11-04
前文我們講到線性迴歸建模會有共線性的問題,嶺迴歸和lasso演算法都能一定程度上消除共線性問題。
嶺迴歸
> #########正則化方法消除共線性 > ###嶺迴歸 > ###glmnet只能處理矩陣 > library(glmnet) > library(mice) > creditcard_exp<-creditcard_exp[complete.cases(creditcard_exp),] > x<-as.matrix(creditcard_exp[,c(6,7,10,11)]) > y<-as.matrix(creditcard_exp[,3])> #看一下嶺脊圖 > r1<-glmnet(x=x,y=y,family = "gaussian",alpha = 0)#alpha = 0表示嶺迴歸,x,y不能有缺失值 > plot(r1,xvar="lambda")
> r1cv<-cv.glmnet(x=x,y=y,family="gaussian",alpha=0,nfolds = 10)#用交叉驗證得到lambda > plot(r1cv)
> rimin<-glmnet(x=x,y=y,family = "gaussian",alpha = 0,lambda= r1cv$lambda.min)#取誤差平方和最小時的λ > coef(rimin) 5 x 1 sparse Matrix of class "dgCMatrix" s0 (Intercept) 106.5467017 Age 0.9156047 Income 19.6903291 dist_home_val 1.7357213 dist_avg_income 71.5765458
我們可以看到這次模型的收入和支出是正相關了。
lasso演算法
#####Lasson演算法:有變數篩選功效r1l<-cv.glmnet(x=x,y=y,family="gaussian",alpha=1,nfolds = 10) plot(r1l)
> r1l1<-glmnet(x=x,y=y,family = "gaussian",alpha = 1,lambda = r1l$lambda.min)#取λ最小值看建模情況 > coef(r1l1) 5 x 1 sparse Matrix of class "dgCMatrix" s0 (Intercept) -27.169039 Age 1.314711 Income -160.195837 dist_home_val 1.538823 dist_avg_income 255.395751
看模型資料,我們得知並沒有解決income為負相關的情況,而且並沒有篩選變數,那麼我們嘗試取lambda.1se*0.5的值
> r1l2<-glmnet(x=x,y=y,family = "gaussian",alpha = 1,lambda = r1l$lambda.1se*0.5)#0.5倍標準誤差的λ > coef(r1l2) 5 x 1 sparse Matrix of class "dgCMatrix" s0 (Intercept) 267.0510318 Age . Income . dist_home_val 0.6249539 dist_avg_income 83.6952253
看結果,可知把一些變數刪去了,消除共線性的問題,接下來我們看看lambda.1se的值
1 > r1l3<-glmnet(x=x,y=y,alpha = 1,family = "gaussian",lambda = r1l$lambda.1se) 2 > coef(r1l3) 3 5 x 1 sparse Matrix of class "dgCMatrix" 4 s0 5 (Intercept) 432.00684 6 Age . 7 Income . 8 dist_home_val . 9 dist_avg_income 68.90894
這次結果只留了一個變數,由此可知當lambda越大,變數保留的越少,一般我們在誤差最小和一倍標準差內選擇合適的λ。