一元線性迴歸-python
阿新 • • 發佈:2018-12-10
思路:
1、從0~10,生成等間距20個數作為x,
2、利用迴歸公式 y=5 + 2x +
3、計算y值
4、對資料進行估計
#生成從0到10之間選20個等間距的數
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
nsample = 20
#從0到10之間選20個等間距的數
x=np.linspace(0,10,nsample)
x
array([ 0. , 0.52631579, 1.05263158, 1.57894737, 2.10526316, 2.63157895, 3.15789474, 3.68421053, 4.21052632, 4.73684211, 5.26315789, 5.78947368, 6.31578947, 6.84210526, 7.36842105, 7.89473684, 8.42105263, 8.94736842, 9.47368421, 10. ])
#使用最小二乘法,需要在陣列的前面新增一列 1,目的是與常數項組合
X=sm.add_constant(x)
X
array([[ 1. , 0. ], [ 1. , 0.52631579], [ 1. , 1.05263158], [ 1. , 1.57894737], [ 1. , 2.10526316], [ 1. , 2.63157895], [ 1. , 3.15789474], [ 1. , 3.68421053], [ 1. , 4.21052632], [ 1. , 4.73684211], [ 1. , 5.26315789], [ 1. , 5.78947368], [ 1. , 6.31578947], [ 1. , 6.84210526], [ 1. , 7.36842105], [ 1. , 7.89473684], [ 1. , 8.42105263], [ 1. , 8.94736842], [ 1. , 9.47368421], [ 1. , 10. ]])
#構造y值,β0=2,β1=5
bate = np.array([2,5])
bate
array([2, 5])
#設計誤差資料,構造高斯分佈
e=np.random.normal(size=nsample)
e
array([-0.08130226, -0.99898515, -0.46717904, -0.52487297, -0.85998302, 1.00102852, 0.61557834, 0.4359724 , 1.36966089, -0.17069984, 0.33877027, -1.602145 , -0.1940928 , 1.58914167, -2.09103106, -0.87802483, -0.46069062, -2.32511203, -1.42386623, -0.22494043])
#實際值,y=β0 + x*β1 + e,構造出來的用於測試的真實值
y=np.dot(X,bate)+e
y
array([ 1.91869774, 3.6325938 , 6.79597886, 9.36986387, 11.66633277, 16.15892325, 18.40505202, 20.85702504, 24.42229247, 25.51351069, 28.65455974, 29.34522342, 33.38485457, 37.79966799, 36.75107421, 40.59565938, 43.64457254, 44.41173008, 47.94455482, 51.77505957])
資料構造完畢,計算迴歸方程
#最小二乘法
model=sm.OLS(y,X)
#擬合數據
res=model.fit()
#迴歸係數,即β0、β2
res.params
array([2.15061173, 4.90034992])
#檢視全部評估結果資料
res.summary()
Dep. Variable: | y | R-squared: | 0.996 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.995 |
Method: | Least Squares | F-statistic: | 4072. |
Date: | Thu, 13 Sep 2018 | Prob (F-statistic): | 1.15e-22 |
Time: | 10:44:47 | Log-Likelihood: | -28.152 |
No. Observations: | 20 | AIC: | 60.30 |
Df Residuals: | 18 | BIC: | 62.30 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | 2.1506 | 0.449 | 4.788 | 0.000 | 1.207 | 3.094 |
x1 | 4.9003 | 0.077 | 63.815 | 0.000 | 4.739 | 5.062 |
Omnibus: | 0.468 | Durbin-Watson: | 1.957 |
---|---|---|---|
Prob(Omnibus): | 0.791 | Jarque-Bera (JB): | 0.572 |
Skew: | 0.274 | Prob(JB): | 0.751 |
Kurtosis: | 2.378 | Cond. No. | 11.5 |
Warnings: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
#擬合估計值
y_=res.fittedvalues
y_
array([ 2.15061173, 4.72974327, 7.30887481, 9.88800634, 12.46713788, 15.04626942, 17.62540096, 20.2045325 , 22.78366403, 25.36279557, 27.94192711, 30.52105865, 33.10019019, 35.67932172, 38.25845326, 40.8375848 , 43.41671634, 45.99584788, 48.57497942, 51.15411095])
#繪圖
fig,ax=plt.subplots(figsize=(8,6))
ax.plot(x,y,'o',label='data')#原始資料
ax.plot(x,y_,'r--',label='test')#擬合數據
ax.legend(loc='best')
plt.show()