資料探勘 R 迴歸分析
阿新 • • 發佈:2018-12-11
- List item
迴歸分析是統計的核心,通常指使用一個或多個預測變數來預測響應變數。
迴歸分析也通常選擇與響應變數有關的變數來作為解釋變數,以此來描述兩者之間的關係。也可以生成一個等式,用解釋變數來解釋響應變數。
在R 中封裝了lm()函式來實現單變數,多變量回歸。
R中符號的說明如下:
data(women) fit<-lm(women$height~women$weight,data=women) summary(fit) Call: lm(formula = women$height ~ women$weight, data = women) Residuals: Min 1Q Median 3Q Max -0.83233 -0.26249 0.08314 0.34353 0.49790 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 25.723456 1.043746 24.64 2.68e-12 *** women$weight 0.287249 0.007588 37.85 1.09e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.44 on 13 degrees of freedom Multiple R-squared: 0.991, Adjusted R-squared: 0.9903 F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14 fitted(fit) 1 2 3 4 5 6 58.75712 59.33162 60.19336 61.05511 61.91686 62.77861 7 8 9 10 11 12 63.64035 64.50210 65.65110 66.51285 67.66184 68.81084 13 14 15 69.95984 71.39608 72.83233 residuals(fit) 1 2 3 4 5 -0.75711680 -0.33161526 -0.19336294 -0.05511062 0.08314170 6 7 8 9 10 0.22139402 0.35964634 0.49789866 0.34890175 0.48715407 11 12 13 14 15 0.33815716 0.18916026 0.04016335 -0.39608278 -0.83232892
多項式迴歸
可以新增一項二次項sq(X)來提高迴歸的預測精度
fit<-lm(women$weight~women$height+I(women$height^2),data=women) summary(fit) Call: lm(formula = women$weight ~ women$height + I(women$height^2), data = women) Residuals: Min 1Q Median 3Q Max -0.50941 -0.29611 -0.00941 0.28615 0.59706 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 261.87818 25.19677 10.393 2.36e-07 *** women$height -7.34832 0.77769 -9.449 6.58e-07 *** I(women$height^2) 0.08306 0.00598 13.891 9.32e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3841 on 12 degrees of freedom Multiple R-squared: 0.9995, Adjusted R-squared: 0.9994 F-statistic: 1.139e+04 on 2 and 12 DF, p-value: < 2.2e-16
分析結果可以看書,迴歸係數都非常顯著,模型方差解釋率已經增加到了99.9%。
我們也可以視覺化一下:
plot(women
weight)
lines(women$height,fitted(fit))