時間序列--選擇基準模型(以後的模型和這個基準比較
1.以前基準是用上一個時刻的值當做這一個時刻的值,
現在我設定引數不知道過去哪一個時刻的值取當做今天的值
from pandas import Series from sklearn.metrics import mean_squared_error from math import sqrt from matplotlib import pyplot # load data series = Series.from_csv('car-sales.csv', header=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] persistence_values = range(1, 25) scores = list() for p in persistence_values: # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # make prediction yhat = history[-p] predictions.append(yhat) # observation history.append(test[i]) # report performance rmse = sqrt(mean_squared_error(test, predictions)) scores.append(rmse) print('p=%d RMSE:%.3f' % (p, rmse)) # plot scores over persistence values pyplot.plot(persistence_values, scores) pyplot.show()
結果:p=1 RMSE:3947.200
p=2 RMSE:5485.353
p=3 RMSE:6346.176
p=4 RMSE:6474.553
p=5 RMSE:5756.543
p=6 RMSE:5756.076
p=7 RMSE:5958.665
p=8 RMSE:6543.266
p=9 RMSE:6450.839
p=10 RMSE:5595.971
p=11 RMSE:3806.482
p=12 RMSE:1997.732
p=13 RMSE:3968.987
p=14 RMSE:5210.866
p=15 RMSE:6299.040
p=16 RMSE:6144.881
p=17 RMSE:5349.691
p=18 RMSE:5534.784
p=19 RMSE:5655.016
p=20 RMSE:6746.872
p=21 RMSE:6784.611
p=22 RMSE:5642.737
p=23 RMSE:3692.062
p=24 RMSE:2119.103
可以看出p=13最好
那麼就用t-13
from pandas import Series from sklearn.metrics import mean_squared_error from math import sqrt from matplotlib import pyplot # load data series = Series.from_csv('car-sales.csv', header=0) # prepare data X = series.values train, test = X[0:-24], X[-24:] # walk-forward validation history = [x for x in train] predictions = list() for i in range(len(test)): # make prediction yhat = history[-12] predictions.append(yhat) # observation history.append(test[i]) # plot predictions vs observations pyplot.plot(test) pyplot.plot(predictions) pyplot.show()
效果圖如下
2.利用滑動窗口裡面的均值去當做這一個時刻的值
from pandas import Series
from sklearn.metrics import mean_squared_error
from math import sqrt
from matplotlib import pyplot
from numpy import mean
# load data
series = Series.from_csv('car-sales.csv', header=0)
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
window_sizes = range(1, 25)
scores = list()
for w in window_sizes:
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# make prediction
yhat = mean(history[-w:])
predictions.append(yhat)
# observation
history.append(test[i])
# report performance
rmse = sqrt(mean_squared_error(test, predictions))
scores.append(rmse)
print('w=%d RMSE:%.3f' % (w, rmse))
# plot scores over window sizes values
pyplot.plot(window_sizes, scores)
pyplot.show()
windowsize就體現了用過去多少的均值(當然你也可以中位數)
w=1 RMSE:3947.200
w=2 RMSE:4350.413
w=3 RMSE:4701.446
w=4 RMSE:4810.510
w=5 RMSE:4649.667
w=6 RMSE:4549.172
w=7 RMSE:4515.684
w=8 RMSE:4614.551
w=9 RMSE:4653.493
w=10 RMSE:4563.802
w=11 RMSE:4321.599
w=12 RMSE:4023.968
w=13 RMSE:3901.634
w=14 RMSE:3907.671
w=15 RMSE:4017.276
w=16 RMSE:4084.080
w=17 RMSE:4076.399
w=18 RMSE:4085.376
w=19 RMSE:4101.505
w=20 RMSE:4195.617
w=21 RMSE:4269.784
w=22 RMSE:4258.226
w=23 RMSE:4158.029
w=24 RMSE:4021.885
可以看到w=13最好,那我們就選13
from pandas import Series
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from numpy import mean
# load data
series = Series.from_csv('car-sales.csv', header=0)
# prepare data
X = series.values
train, test = X[0:-24], X[-24:]
# walk-forward validation
history = [x for x in train]
predictions = list()
for i in range(len(test)):
# make prediction
yhat = mean(history[-13:])
predictions.append(yhat)
# observation
history.append(test[i])
# plot predictions vs observations
pyplot.plot(test)
pyplot.plot(predictions)
pyplot.show()
效果圖如上
https://machinelearningmastery.com/simple-time-series-forecasting-models/