1. 程式人生 > >一元線性迴歸-python

一元線性迴歸-python

思路:

1、從0~10,生成等間距20個數作為x,

2、利用迴歸公式 y=5 + 2x + \varepsilon

3、計算y值

4、對資料進行估計

#生成從0到10之間選20個等間距的數
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
nsample = 20
#從0到10之間選20個等間距的數
x=np.linspace(0,10,nsample)
x
array([ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
        2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
        5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
        7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ])
#使用最小二乘法,需要在陣列的前面新增一列 1,目的是與常數項組合
X=sm.add_constant(x)
X
array([[ 1.        ,  0.        ],
       [ 1.        ,  0.52631579],
       [ 1.        ,  1.05263158],
       [ 1.        ,  1.57894737],
       [ 1.        ,  2.10526316],
       [ 1.        ,  2.63157895],
       [ 1.        ,  3.15789474],
       [ 1.        ,  3.68421053],
       [ 1.        ,  4.21052632],
       [ 1.        ,  4.73684211],
       [ 1.        ,  5.26315789],
       [ 1.        ,  5.78947368],
       [ 1.        ,  6.31578947],
       [ 1.        ,  6.84210526],
       [ 1.        ,  7.36842105],
       [ 1.        ,  7.89473684],
       [ 1.        ,  8.42105263],
       [ 1.        ,  8.94736842],
       [ 1.        ,  9.47368421],
       [ 1.        , 10.        ]])
#構造y值,β0=2,β1=5
bate = np.array([2,5])
bate
array([2, 5])
#設計誤差資料,構造高斯分佈
e=np.random.normal(size=nsample)
e
array([-0.08130226, -0.99898515, -0.46717904, -0.52487297, -0.85998302,
        1.00102852,  0.61557834,  0.4359724 ,  1.36966089, -0.17069984,
        0.33877027, -1.602145  , -0.1940928 ,  1.58914167, -2.09103106,
       -0.87802483, -0.46069062, -2.32511203, -1.42386623, -0.22494043])
#實際值,y=β0 + x*β1 + e,構造出來的用於測試的真實值
y=np.dot(X,bate)+e
y
array([ 1.91869774,  3.6325938 ,  6.79597886,  9.36986387, 11.66633277,
       16.15892325, 18.40505202, 20.85702504, 24.42229247, 25.51351069,
       28.65455974, 29.34522342, 33.38485457, 37.79966799, 36.75107421,
       40.59565938, 43.64457254, 44.41173008, 47.94455482, 51.77505957])

資料構造完畢,計算迴歸方程

#最小二乘法
model=sm.OLS(y,X)

#擬合數據
res=model.fit()

#迴歸係數,即β0、β2
res.params

array([2.15061173, 4.90034992])
#檢視全部評估結果資料
res.summary()
OLS Regression Results
Dep. Variable: y R-squared: 0.996
Model: OLS Adj. R-squared: 0.995
Method: Least Squares F-statistic: 4072.
Date: Thu, 13 Sep 2018 Prob (F-statistic): 1.15e-22
Time: 10:44:47 Log-Likelihood: -28.152
No. Observations: 20 AIC: 60.30
Df Residuals: 18 BIC: 62.30
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const 2.1506 0.449 4.788 0.000 1.207 3.094
x1 4.9003 0.077 63.815 0.000 4.739 5.062
Omnibus: 0.468 Durbin-Watson: 1.957
Prob(Omnibus): 0.791 Jarque-Bera (JB): 0.572
Skew: 0.274 Prob(JB): 0.751
Kurtosis: 2.378 Cond. No. 11.5

Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

#擬合估計值
y_=res.fittedvalues
y_
array([ 2.15061173,  4.72974327,  7.30887481,  9.88800634, 12.46713788,
       15.04626942, 17.62540096, 20.2045325 , 22.78366403, 25.36279557,
       27.94192711, 30.52105865, 33.10019019, 35.67932172, 38.25845326,
       40.8375848 , 43.41671634, 45.99584788, 48.57497942, 51.15411095])
#繪圖
fig,ax=plt.subplots(figsize=(8,6))
ax.plot(x,y,'o',label='data')#原始資料
ax.plot(x,y_,'r--',label='test')#擬合數據
ax.legend(loc='best')
plt.show()