1. 程式人生 > >基於長短期記憶神經網路LSTM的多步長時間序列預測

基於長短期記憶神經網路LSTM的多步長時間序列預測

        基於長短期記憶神經網路LSTM的多步長多變數時間序列預測

     長短時記憶網路(LSTM)是一種能夠學習和預測長序列的遞迴神經網路。LSTMs除了學習長序列外,還可以學習一次多步預測,這對於時間序列的預測非常有用。LSTMs的一個困難在於,它們可能難以配置,而且需要大量的準備工作才能獲得適合學習的格式的資料。

    在本教程中,您將瞭解如何使用Keras在Python中開發用於多步驟時間序列預測的LSTM。

    完成本教程後,您將知道:

     如何為多步時間序列預測準備資料。

     如何建立多步時間序列預測的LSTM模型。

     如何評價一個多步驟的時間序列預測。

教程概述

      本教程分為四個部分;它們是:

      洗髮水的銷售資料集

       資料準備和模型評估

       永續性模型

       多步LSTM

環境

      本教程假設您已經安裝了Python SciPy環境。您可以在這個示例中使用python2或python3。

      本教程假設您已經安裝了Keras v2.0或更高版本,並安裝了TensorFlow或Theano後端

      本教程還假設您已經安裝了scikit-learn、panda、NumPy和Matplotlib。

Shampoo Sales Dataset

     這個資料集描述了3年期間洗髮水的月銷售量。單位是銷售計數,有36個觀察值。原始資料集歸功於Makridakis, Wheelwright和Hyndman(1998)。您可以在這裡下載並瞭解有關資料集的更多資訊。下面的示例載入並建立已載入資料集的圖。

                             

       接下來,我們將研究實驗中使用的模型配置和測試工具。

資料準備和模型評估

      本節描述本教程中使用的資料準備和模型評估

資料分割

      我們將洗髮水銷售資料集分為兩部分:培訓和測試集。培訓資料集採用前兩年的資料,測試集採用後一年的資料。使用訓練資料集開發模型,並對測試資料集進行預測。過去12個月的觀察結果如下:

"3-01",339.7
"3-02",440.4
"3-03",315.9
"3-04",439.3
"3-05",401.3
"3-06",437.4
"3-07",575.5
"3-08",407.6
"3-09",682.0
"3-10",475.3
"3-11",581.3
"3-12",646.9

多步預測

       我們將設計一個多步驟的預測。對於資料集最後12個月的某一個月,我們需要做3個月的預測。這是已知的歷史觀測(t-1, t-2,…t-n)預測t, t+1和t+2。具體來說,從第二年的12月開始,我們必須預測1月、2月和3月。從一月開始,我們必須預測二月、三月和四月。一直到10月,11月,12月的預測從第3年的9月。總共需要10個3個月的預測,如下:

Dec,	Jan, Feb, Mar
Jan,	Feb, Mar, Apr
Feb,	Mar, Apr, May
Mar,	Apr, May, Jun
Apr, 	May, Jun, Jul
May,	Jun, Jul, Aug
Jun,	Jul, Aug, Sep
Jul,	Aug, Sep, Oct
Aug,	Sep, Oct, Nov
Sep,	Oct, Nov, Dec

模型評價

      將使用滾動預測場景,也稱為前向模型驗證。測試資料集的每個時間步驟都將一次執行一個。將使用一個模型對時間步驟進行預測,然後從測試集中獲取下個月的實際期望值,並將其提供給模型,用於下一個時間步驟的預測。這模擬了一個真實的場景,在這個場景中,每個月都可以獲得新的洗髮水銷售資料,並將其用於下一個月的預測。這將通過列車和測試資料集的結構進行模擬。將收集測試資料集上的所有預測,並計算錯誤得分,以總結模型對每個預測時間步驟的技能。使用均方根誤差(RMSE)來懲罰較大的誤差,得到的分數與預測資料的單位相同,即月度洗髮水銷售。

永續性模型

      時間序列預測的一個好的基線是永續性模型。這是一個預測模型,在這個模型中,最後一個觀測值被向前持久化。由於它的簡單性,它通常被稱為幼稚的預測。

準備資料

       第一步是將資料從一個系列轉換成一個有監督的學習問題。也就是從一組數字到一組輸入和輸出模式。我們可以使用一個預先準備好的函式series_to_supervised()來實現這一點。

# convert time series into supervised learning problem
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

      該函式可以通過傳入載入的級數值,其中n_in值為1,n_out值為3來呼叫;例如:

supervised = series_to_supervised(raw_values, 1, 3)

       接下來,我們可以將監督學習資料集分為訓練集和測試集。我們知道在這個表單中,最後10行包含最後一年的資料。這些行組成測試集,其餘的資料組成訓練資料集。我們可以將所有這些放在一個新函式中,該函式接受載入的系列和一些引數,並返回一個準備建模的訓練集和測試集。

# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
	# extract raw values
	raw_values = series.values
	raw_values = raw_values.reshape(len(raw_values), 1)
	# transform into supervised learning problem X, y
	supervised = series_to_supervised(raw_values, n_lag, n_seq)
	supervised_values = supervised.values
	# split into train and test sets
	train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
	return train, test

      我們可以用洗髮水資料集來測試。下面列出了完整的示例。

from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from pandas import datetime

# date-time parsing function for loading the dataset
def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

# convert time series into supervised learning problem
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
	# extract raw values
	raw_values = series.values
	raw_values = raw_values.reshape(len(raw_values), 1)
	# transform into supervised learning problem X, y
	supervised = series_to_supervised(raw_values, n_lag, n_seq)
	supervised_values = supervised.values
	# split into train and test sets
	train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
	return train, test

# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# configure
n_lag = 1
n_seq = 3
n_test = 10
# prepare data
train, test = prepare_data(series, n_test, n_lag, n_seq)
print(test)
print('Train: %s, Test: %s' % (train.shape, test.shape))

       執行該示例首先列印整個測試資料集,也就是最後10行。還列印了訓練集、測試資料集的形狀和大小。

[[ 342.3  339.7  440.4  315.9]
 [ 339.7  440.4  315.9  439.3]
 [ 440.4  315.9  439.3  401.3]
 [ 315.9  439.3  401.3  437.4]
 [ 439.3  401.3  437.4  575.5]
 [ 401.3  437.4  575.5  407.6]
 [ 437.4  575.5  407.6  682. ]
 [ 575.5  407.6  682.   475.3]
 [ 407.6  682.   475.3  581.3]
 [ 682.   475.3  581.3  646.9]]
Train: (23, 4), Test: (10, 4)

       我們可以看到測試資料集第一行的單個輸入值(第一列)與第二年12月份洗髮水銷售的觀察結果相匹配:

"2-12",342.3

      我們還可以看到,對於每個觀察中的1個輸入值和3個輸出值,每行包含4列。

做預測

      下一步是進行永續性預測。我們可以在一個名為persistence()的函式中輕鬆實現永續性預測,該函式執行最後一次觀察和要持久化的預測步驟的數量。這個函式返回一個包含預測的陣列。

# make a persistence forecast
def persistence(last_ob, n_seq):
	return [last_ob for i in range(n_seq)]

      然後我們可以為測試資料集中從第二年12月到第三年9月的每個時間步驟呼叫這個函式。下面是一個函式make_forecasts(),它執行此操作,並將資料集的訓練、測試和配置作為引數,並返回預測列表。

# evaluate the persistence model
def make_forecasts(train, test, n_lag, n_seq):
	forecasts = list()
	for i in range(len(test)):
		X, y = test[i, 0:n_lag], test[i, n_lag:]
		# make forecast
		forecast = persistence(X[-1], n_seq)
		# store the forecast
		forecasts.append(forecast)
	return forecasts

       我們可以這樣呼叫這個函式:

forecasts = make_forecasts(train, test, 1, 3)

評估預測

      最後一步是評估這些預測。我們可以通過計算多步驟預測的每個時間步的RMSE來實現這一點,在本例中給出了3個RMSE得分。下面的函式evaluate_forecasts()計算並列印每個預測時間步驟的RMSE。

# evaluate the RMSE for each forecast time step
def evaluate_forecasts(test, forecasts, n_lag, n_seq):
	for i in range(n_seq):
		actual = test[:,(n_lag+i)]
		predicted = [forecast[i] for forecast in forecasts]
		rmse = sqrt(mean_squared_error(actual, predicted))
		print('t+%d RMSE: %f' % ((i+1), rmse))

        我們可以這樣呼叫這個函式:

evaluate_forecasts(test, forecasts, 1, 3)

       這也有助於在原始資料集的背景下繪製預測圖,從而瞭解RMSE分數在背景下與問題的關係。我們可以先繪製整個洗髮水資料集,然後將每個預測繪製為紅線。下面的函式plot_forecasts()將建立並顯示此圖。

# plot the forecasts in the context of the original dataset
def plot_forecasts(series, forecasts, n_test):
	# plot the entire dataset in blue
	pyplot.plot(series.values)
	# plot the forecasts in red
	for i in range(len(forecasts)):
		off_s = len(series) - n_test + i
		off_e = off_s + len(forecasts[i])
		xaxis = [x for x in range(off_s, off_e)]
		pyplot.plot(xaxis, forecasts[i], color='red')
	# show the plot
	pyplot.show()

       我們可以像下面這樣呼叫這個函式。請注意,12個月的測試集中保留的觀察值是12個,而上面使用的10個監督學習輸入/輸出模式的觀察值是10個。

# plot forecasts
plot_forecasts(series, forecasts, 12)

      通過將持久化預測與原始資料集中的實際持久化值連線起來,我們可以更好地繪製圖表。這將需要將最後一個觀測值新增到預測前面。下面是改進後的plot_forecasts()函式的更新版本。

# plot the forecasts in the context of the original dataset
def plot_forecasts(series, forecasts, n_test):
	# plot the entire dataset in blue
	pyplot.plot(series.values)
	# plot the forecasts in red
	for i in range(len(forecasts)):
		off_s = len(series) - 12 + i - 1
		off_e = off_s + len(forecasts[i]) + 1
		xaxis = [x for x in range(off_s, off_e)]
		yaxis = [series.values[off_s]] + forecasts[i]
		pyplot.plot(xaxis, yaxis, color='red')
	# show the plot
	pyplot.show()

完整的示例

        我們可以把上述這些程式碼片段拼在一起,下面列出了多步驟永續性預測的完整程式碼示例:

from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from math import sqrt
from matplotlib import pyplot

# date-time parsing function for loading the dataset
def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

# convert time series into supervised learning problem
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
	# extract raw values
	raw_values = series.values
	raw_values = raw_values.reshape(len(raw_values), 1)
	# transform into supervised learning problem X, y
	supervised = series_to_supervised(raw_values, n_lag, n_seq)
	supervised_values = supervised.values
	# split into train and test sets
	train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
	return train, test

# make a persistence forecast
def persistence(last_ob, n_seq):
	return [last_ob for i in range(n_seq)]

# evaluate the persistence model
def make_forecasts(train, test, n_lag, n_seq):
	forecasts = list()
	for i in range(len(test)):
		X, y = test[i, 0:n_lag], test[i, n_lag:]
		# make forecast
		forecast = persistence(X[-1], n_seq)
		# store the forecast
		forecasts.append(forecast)
	return forecasts

# evaluate the RMSE for each forecast time step
def evaluate_forecasts(test, forecasts, n_lag, n_seq):
	for i in range(n_seq):
		actual = test[:,(n_lag+i)]
		predicted = [forecast[i] for forecast in forecasts]
		rmse = sqrt(mean_squared_error(actual, predicted))
		print('t+%d RMSE: %f' % ((i+1), rmse))

# plot the forecasts in the context of the original dataset
def plot_forecasts(series, forecasts, n_test):
	# plot the entire dataset in blue
	pyplot.plot(series.values)
	# plot the forecasts in red
	for i in range(len(forecasts)):
		off_s = len(series) - n_test + i - 1
		off_e = off_s + len(forecasts[i]) + 1
		xaxis = [x for x in range(off_s, off_e)]
		yaxis = [series.values[off_s]] + forecasts[i]
		pyplot.plot(xaxis, yaxis, color='red')
	# show the plot
	pyplot.show()

# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# configure
n_lag = 1
n_seq = 3
n_test = 10
# prepare data
train, test = prepare_data(series, n_test, n_lag, n_seq)
# make forecasts
forecasts = make_forecasts(train, test, n_lag, n_seq)
# evaluate forecasts
evaluate_forecasts(test, forecasts, n_lag, n_seq)
# plot forecasts
plot_forecasts(series, forecasts, n_test+2)

      執行該示例首先為每個預測的時間步驟列印RMSE。這為我們提供了LSTM在每個時間步上的效能基線,我們希望LSTM能夠表現得更好。

t+1 RMSE: 144.535304
t+2 RMSE: 86.479905
t+3 RMSE: 121.149168

         同時還繪製了具有多步持續性預測的原始時間序列圖。這些線連線到每個預測的適當輸入值。這個上下文說明永續性預測實際上是多麼幼稚。

                   

多步LSTM網路

       在本節中,我們將使用永續性示例作為起點,並研究將LSTM適合於培訓資料並對測試資料集進行多步驟預測所需的更改。

準備資料

       在我們使用這些資料來培訓LSTM之前,必須準備好這些資料。具體來說,還需要兩項改動:

      平穩性:資料顯示出增長的趨勢,必須通過差分來消除。

      歸一化:資料的規模必須縮小到-1到1之間的值,即LSTM單元的啟用函式。

       我們可以引入一個函式使資料平穩,稱為差分()。這將把一系列的值轉換成一系列的差異,一個更簡單的表示。

# create a differenced series
def difference(dataset, interval=1):
	diff = list()
	for i in range(interval, len(dataset)):
		value = dataset[i] - dataset[i - interval]
		diff.append(value)
	return Series(diff)

        我們可以使用sklearn庫中的MinMaxScaler來縮放資料。將這些組合在一起,我們可以更新prepare_data()函式,使其首先對資料進行差異處理並對其進行重新縮放,然後將轉換轉換為有監督學習問題,並像以前在永續性示例中所做的那樣培訓測試集。該函式現在除了返回火車和測試資料集之外,還返回一個標量。

# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
	# extract raw values
	raw_values = series.values
	# transform data to be stationary
	diff_series = difference(raw_values, 1)
	diff_values = diff_series.values
	diff_values = diff_values.reshape(len(diff_values), 1)
	# rescale values to -1, 1
	scaler = MinMaxScaler(feature_range=(-1, 1))
	scaled_values = scaler.fit_transform(diff_values)
	scaled_values = scaled_values.reshape(len(scaled_values), 1)
	# transform into supervised learning problem X, y
	supervised = series_to_supervised(scaled_values, n_lag, n_seq)
	supervised_values = supervised.values
	# split into train and test sets
	train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
	return scaler, train, test

       我們可以這樣呼叫:

# prepare data
scaler, train, test = prepare_data(series, n_test, n_lag, n_seq)

擬合LSTM網路

     接下來,我們需要將LSTM網路模型與訓練資料相匹配。這首先要求訓練資料集從2D陣列[樣本,特徵]轉換為3D陣列[樣本,時間步長,特徵]。我們將把時間步驟固定在1,所以這個更改很簡單。接下來,我們需要設計一個LSTM網路。我們將使用一個簡單的結構,一個隱藏層和一個LSTM單元,然後是一個線性啟用的輸出層和三個輸出值。該網路將採用均方誤差損失函式和高效的亞當優化演算法。LSTM是有狀態的;這意味著我們必須在每個訓練階段結束時手動重置網路狀態。這個網路將適用於1500個時代。對於訓練和預測,必須使用相同的批大小,並且我們要求在測試資料集的每個步驟中都進行預測。這意味著必須使用批大小為1的批處理。批量大小為1也稱為線上學習,因為每次訓練模式結束後,網路權重都會在訓練過程中更新(而不是小批量或批量更新)。我們可以將所有這些放在一個名為fit_lstm()的函式中。該函式接受一些關鍵引數,這些引數可用於以後優化網路,並返回一個適合進行預測的LSTM模型。

# fit an LSTM network to training data
def fit_lstm(train, n_lag, n_seq, n_batch, nb_epoch, n_neurons):
	# reshape training into [samples, timesteps, features]
	X, y = train[:, 0:n_lag], train[:, n_lag:]
	X = X.reshape(X.shape[0], 1, X.shape[1])
	# design network
	model = Sequential()
	model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
	model.add(Dense(y.shape[1]))
	model.compile(loss='mean_squared_error', optimizer='adam')
	# fit network
	for i in range(nb_epoch):
		model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
		model.reset_states()
	return model

       我們可以這樣呼叫:

# fit model
model = fit_lstm(train, 1, 3, 1, 1500, 1)

       網路配置未調優;如果您喜歡,可以嘗試不同的引數。
LSTM預測

      下一步是利用fit LSTM網路進行預測。使用合適的LSTM網路,可以通過呼叫model.predict()進行單個預測。同樣,必須將資料格式化為具有[示例、時間步驟、特性]格式的3D陣列。我們可以將它封裝到一個名為forecast_lstm()的函式中。

# make one forecast with an LSTM,
def forecast_lstm(model, X, n_batch):
	# reshape input pattern to [samples, timesteps, features]
	X = X.reshape(1, 1, len(X))
	# make forecast
	forecast = model.predict(X, batch_size=n_batch)
	# convert to array
	return [x for x in forecast[0, :]]

         我們可以從make_forecasts()函式呼叫這個函式,並將其更新為接受模型作為引數。更新後的版本如下:

# evaluate the persistence model
def make_forecasts(model, n_batch, train, test, n_lag, n_seq):
	forecasts = list()
	for i in range(len(test)):
		X, y = test[i, 0:n_lag], test[i, n_lag:]
		# make forecast
		forecast = forecast_lstm(model, X, n_batch)
		# store the forecast
		forecasts.append(forecast)
	return forecasts

      更新版本可以這樣呼叫:

# make forecasts
forecasts = make_forecasts(model, 1, train, test, 1, 3)

逆變換

       在做出預測之後,我們需要對轉換進行反向操作,以便將值返回到原始的比例。這是必要的,以便我們可以計算出與其他模型(如上面的永續性預測)相比較的錯誤得分和圖表。我們可以使用提供inverse_transform()函式的MinMaxScaler物件來直接反轉預測的規模。我們可以通過將最近的觀察值(前幾個月的洗髮水銷售)新增到第一個預測值,然後將該值向下傳播,來逆轉這種差異。這有點複雜;我們可以將行為包裝在一個函式名inverse_difference()中,該函式接受預測和預測之前的最後一個觀測值作為引數,並返回反向預測。

# invert differenced forecast
def inverse_difference(last_ob, forecast):
	# invert first forecast
	inverted = list()
	inverted.append(forecast[0] + last_ob)
	# propagate difference forecast using inverted first value
	for i in range(1, len(forecast)):
		inverted.append(forecast[i] + inverted[i-1])
	return inverted

       把這些放在一起,我們可以建立一個inverse_transform()函式,該函式在每次預測中工作,首先反轉尺度,然後反轉差異,將預測返回到它們原來的尺度。

# inverse data transform on forecasts
def inverse_transform(series, forecasts, scaler, n_test):
	inverted = list()
	for i in range(len(forecasts)):
		# create array from forecast
		forecast = array(forecasts[i])
		forecast = forecast.reshape(1, len(forecast))
		# invert scaling
		inv_scale = scaler.inverse_transform(forecast)
		inv_scale = inv_scale[0, :]
		# invert differencing
		index = len(series) - n_test + i - 1
		last_ob = series.values[index]
		inv_diff = inverse_difference(last_ob, inv_scale)
		# store
		inverted.append(inv_diff)
	return inverted

      可以這樣呼叫:

# inverse transform forecasts and test
forecasts = inverse_transform(series, forecasts, scaler, n_test+2)

       我們還可以對輸出部分測試資料集進行逆變換,從而正確計算RMSE分數,如下圖所示:

actual = [row[n_lag:] for row in test]
actual = inverse_transform(series, actual, scaler, n_test+2)

        我們也可以簡化RMSE分數的計算,期望測試資料只包含輸出值,如下所示:

def evaluate_forecasts(test, forecasts, n_lag, n_seq):
	for i in range(n_seq):
		actual = [row[i] for row in test]
		predicted = [forecast[i] for forecast in forecasts]
		rmse = sqrt(mean_squared_error(actual, predicted))
		print('t+%d RMSE: %f' % ((i+1), rmse))

完整的示例

      我們可以將所有這些部分連線在一起,並將LSTM網路用於多步時間序列預測問題,下面提供了完整的程式碼清單:

from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sqrt
from matplotlib import pyplot
from numpy import array

# date-time parsing function for loading the dataset
def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

# convert time series into supervised learning problem
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

# create a differenced series
def difference(dataset, interval=1):
	diff = list()
	for i in range(interval, len(dataset)):
		value = dataset[i] - dataset[i - interval]
		diff.append(value)
	return Series(diff)

# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
	# extract raw values
	raw_values = series.values
	# transform data to be stationary
	diff_series = difference(raw_values, 1)
	diff_values = diff_series.values
	diff_values = diff_values.reshape(len(diff_values), 1)
	# rescale values to -1, 1
	scaler = MinMaxScaler(feature_range=(-1, 1))
	scaled_values = scaler.fit_transform(diff_values)
	scaled_values = scaled_values.reshape(len(scaled_values), 1)
	# transform into supervised learning problem X, y
	supervised = series_to_supervised(scaled_values, n_lag, n_seq)
	supervised_values = supervised.values
	# split into train and test sets
	train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
	return scaler, train, test

# fit an LSTM network to training data
def fit_lstm(train, n_lag, n_seq, n_batch, nb_epoch, n_neurons):
	# reshape training into [samples, timesteps, features]
	X, y = train[:, 0:n_lag], train[:, n_lag:]
	X = X.reshape(X.shape[0], 1, X.shape[1])
	# design network
	model = Sequential()
	model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
	model.add(Dense(y.shape[1]))
	model.compile(loss='mean_squared_error', optimizer='adam')
	# fit network
	for i in range(nb_epoch):
		model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
		model.reset_states()
	return model

# make one forecast with an LSTM,
def forecast_lstm(model, X, n_batch):
	# reshape input pattern to [samples, timesteps, features]
	X = X.reshape(1, 1, len(X))
	# make forecast
	forecast = model.predict(X, batch_size=n_batch)
	# convert to array
	return [x for x in forecast[0, :]]

# evaluate the persistence model
def make_forecasts(model, n_batch, train, test, n_lag, n_seq):
	forecasts = list()
	for i in range(len(test)):
		X, y = test[i, 0:n_lag], test[i, n_lag:]
		# make forecast
		forecast = forecast_lstm(model, X, n_batch)
		# store the forecast
		forecasts.append(forecast)
	return forecasts

# invert differenced forecast
def inverse_difference(last_ob, forecast):
	# invert first forecast
	inverted = list()
	inverted.append(forecast[0] + last_ob)
	# propagate difference forecast using inverted first value
	for i in range(1, len(forecast)):
		inverted.append(forecast[i] + inverted[i-1])
	return inverted

# inverse data transform on forecasts
def inverse_transform(series, forecasts, scaler, n_test):
	inverted = list()
	for i in range(len(forecasts)):
		# create array from forecast
		forecast = array(forecasts[i])
		forecast = forecast.reshape(1, len(forecast))
		# invert scaling
		inv_scale = scaler.inverse_transform(forecast)
		inv_scale = inv_scale[0, :]
		# invert differencing
		index = len(series) - n_test + i - 1
		last_ob = series.values[index]
		inv_diff = inverse_difference(last_ob, inv_scale)
		# store
		inverted.append(inv_diff)
	return inverted

# evaluate the RMSE for each forecast time step
def evaluate_forecasts(test, forecasts, n_lag, n_seq):
	for i in range(n_seq):
		actual = [row[i] for row in test]
		predicted = [forecast[i] for forecast in forecasts]
		rmse = sqrt(mean_squared_error(actual, predicted))
		print('t+%d RMSE: %f' % ((i+1), rmse))

# plot the forecasts in the context of the original dataset
def plot_forecasts(series, forecasts, n_test):
	# plot the entire dataset in blue
	pyplot.plot(series.values)
	# plot the forecasts in red
	for i in range(len(forecasts)):
		off_s = len(series) - n_test + i - 1
		off_e = off_s + len(forecasts[i]) + 1
		xaxis = [x for x in range(off_s, off_e)]
		yaxis = [series.values[off_s]] + forecasts[i]
		pyplot.plot(xaxis, yaxis, color='red')
	# show the plot
	pyplot.show()

# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# configure
n_lag = 1
n_seq = 3
n_test = 10
n_epochs = 1500
n_batch = 1
n_neurons = 1
# prepare data
scaler, train, test = prepare_data(series, n_test, n_lag, n_seq)
# fit model
model = fit_lstm(train, n_lag, n_seq, n_batch, n_epochs, n_neurons)
# make forecasts
forecasts = make_forecasts(model, n_batch, train, test, n_lag, n_seq)
# inverse transform forecasts and test
forecasts = inverse_transform(series, forecasts, scaler, n_test+2)
actual = [row[n_lag:] for row in test]
actual = inverse_transform(series, actual, scaler, n_test+2)
# evaluate forecasts
evaluate_forecasts(actual, forecasts, n_lag, n_seq)
# plot forecasts
plot_forecasts(series, forecasts, n_test+2)

        執行該示例首先為每個預測的時間步驟列印RMSE。我們可以看到,每個預測時間步的得分都比永續性預測好,在某些情況下要好得多。這表明配置的LSTM確實具有解決問題的技能。有趣的是,RMSE並沒有像預期的那樣隨著預測範圍的延長而逐漸變差。這是因為t+2比t+1更容易預測。這可能是因為向下的勾號比本系列中提到的向上勾號更容易預測(可以通過對結果進行更深入的分析來證實這一點)。

t+1 RMSE: 95.973221
t+2 RMSE: 78.872348
t+3 RMSE: 105.613951

       還建立了一個帶有預測(紅色)的系列線圖(藍色)。從圖中可以看出,雖然模型的技巧比較好,但是一些預測並不十分好,還有很大的改進空間。

                           

擴充套件
    更新LSTM。在提供新資料時,更改示例以重新安裝或更新LSTM。10個訓練階段應該足以用新的觀察來重新訓練。

    優化LSTM。網格搜尋本教程中使用的一些LSTM引數,例如紀元數、神經元數和層數,看看是否可以進一步提高效能。

    Seq2Seq。使用LSTMs的編碼器-解碼器正規化預測每個序列,看看這是否有任何好處。

    時間範圍。嘗試預測不同的時間範圍,看看網路的行為在不同的交貨時間是如何變化的。