Demonstration of Memory with a Long Short

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning over long sequences.

This differentiates them from regular multilayer neural networks that do not have memory and can only learn a mapping between input and output patterns.

It is important to understand the capabilities of complex neural networks like LSTMs on small contrived problems as this understanding will help you scale the network up to large and even very large problems.

In this tutorial, you will discover the capability of LSTMs to remember and recall.

After completing this tutorial, you will know:

How to define a small sequence prediction problem that only an RNN like LSTMs can solve using memory.
How to transform the problem representation so that it is suitable for learning by LSTMs.
How to design an LSTM to solve the problem correctly.

Let’s get started.

A Demonstration of Memory in a Long Short-Term Memory Network
Photo by

crazlei, some rights reserved.

Environment

This tutorial assumes you have a working Python 2 or 3 environment with SciPy, Keras 2.0 or higher with a TensorFlow or Theano backend.

For help setting up your Python environment, see the post:

Sequence Problem Description

The problem is to predict values of a sequence one at a time.

Given one value in the sequence, the model must predict the next value in the sequence. For example, given a value of “0” as an input, the model must predict the value “1”.

There are two different sequences that the model must learn and correctly predict.

A wrinkle is that there is conflicting information between the two sequences and that the model must know the context of each one-step prediction (e.g. the sequence it is currently predicting) in order to correctly predict each full sequence.

This wrinkle is important to prevent the model from memorizing each single-step input-output pair of values in each sequence, as a sequence unaware model may be inclined to do.

The two sequences to be learned are as follows:

3, 0, 1, 2, 3
4, 0, 1, 2, 4

We can see that the first value of the sequence is repeated as the last value of the sequence. This is the indicator that provides context to the model as to which sequence it is working on.

The conflict is the transition from the second to last items in each sequence. In sequence one, a “2” is given as an input and a “3” must be predicted, whereas in sequence two, a “2” is given as input and a “4” must be predicted.

This is a problem that a multilayer Perceptron and other non-recurrent neural networks cannot learn.

This is a simplified version of “Experiment 2” used to demonstrate LSTM long-term memory capabilities in Hochreiter and Schmidhuber’s 1997 paper Long Short Term Memory (PDF).

Need help with LSTMs for Sequence Prediction?

Take my free 7-day email course and discover 6 different LSTM architectures (with code).

Click to sign-up and also get a free PDF Ebook version of the course.

Problem Representation

This section is divided into 3 parts; they are:

One Hot Encoding
Input-Output Pairs
Reshape Data

One Hot Encoding

We will use a one hot encoding to represent the learning problem for the LSTM.

That is, each input and output value will be represented as a binary vector with 5 elements, because the alphabet of the problem is 5 unique values.

For example, the 5 values of [0, 1, 2, 3, 4] are represented as the following 5 binary vectors:

0: [1, 0, 0, 0, 0]
1: [0, 1, 0, 0, 0]
2: [0, 0, 1, 0, 0]
3: [0, 0, 0, 1, 0]
4: [0, 0, 0, 0, 1]

12345

0: [1, 0, 0, 0, 0]1: [0, 1, 0, 0, 0]2: [0, 0, 1, 0, 0]3: [0, 0, 0, 1, 0]4: [0, 0, 0, 0, 1]

We can do this with a simple function that will take a sequence and return a list of binary vectors for each value in the sequence. The function encode() below implements this behavior.

# binary encode an input pattern, return a list of binary vectors
def encode(pattern, n_unique):
	encoded = list()
	for value in pattern:
		row = [0.0 for x in range(n_unique)]
		row[value] = 1.0
		encoded.append(row)
	return encoded

12345678

We can test it on the first sequence and print the resulting list of binary vectors. The complete example is listed below.

# binary encode an input pattern, return a list of binary vectors
def encode(pattern, n_unique):
	encoded = list()
	for value in pattern:
		row = [0.0 for x in range(n_unique)]
		row[value] = 1.0
		encoded.append(row)
	return encoded

seq1 = [3, 0, 1, 2, 3]
encoded = encode(seq1, 5)
for vector in encoded:
	print(vector)

12345678910111213

Running the example prints each binary vector. Note that we use the floating point values 0.0 and 1.0 because they will be used as inputs and outputs for the model.

[0.0, 0.0, 0.0, 1.0, 0.0]
[1.0, 0.0, 0.0, 0.0, 0.0]
[0.0, 1.0, 0.0, 0.0, 0.0]
[0.0, 0.0, 1.0, 0.0, 0.0]
[0.0, 0.0, 0.0, 1.0, 0.0]

12345

[0.0, 0.0, 0.0, 1.0, 0.0][1.0, 0.0, 0.0, 0.0, 0.0][0.0, 1.0, 0.0, 0.0, 0.0][0.0, 0.0, 1.0, 0.0, 0.0][0.0, 0.0, 0.0, 1.0, 0.0]

Input-Output Pairs

The next step is to split a sequence of encoded values into input-output pairs.

This is a supervised learning representation of the problem such that machine learning problems can learn how to map an input pattern (X) to an output pattern (y).

For example, the first sequence has the following input-output pairs to be learned:

X,	y
3,	0
0,	1
1,	2
2,	3

12345

X, y3, 00, 11, 22, 3

Instead of the raw numbers, we must create these mapping pairs from the one hot encoded binary vectors.

For example, the first input-output pairs for 3->0 would be:

X,			y
[0, 0, 0, 1, 0]		[1, 0, 0, 0, 0]

12	X, y[0, 0, 0, 1, 0] [1, 0, 0, 0, 0]

Below is a function named to_xy_pairs() that will create lists of X and y patterns given a list of encoded binary vectors.

# create input/output pairs of encoded vectors, returns X, y
def to_xy_pairs(encoded):
	X,y = list(),list()
	for i in range(1, len(encoded)):
		X.append(encoded[i-1])
		y.append(encoded[i])
	return X, y

1234567

# create input/output pairs of encoded vectors, returns X, ydef to_xy_pairs(encoded):X,y=list(),list()foriinrange(1,len(encoded)):X.append(encoded[i-1])y.append(encoded[i])returnX,y

We can put this together with the one hot encoding function above and print the encoded input and output pairs for the first sequence.

# binary encode an input pattern, return a list of binary vectors
def encode(pattern, n_unique):
	encoded = list()
	for value in pattern:
		row = [0.0 for x in range(n_unique)]
		row[value] = 1.0
		encoded.append(row)
	return encoded

# create input/output pairs of encoded vectors, returns X, y
def to_xy_pairs(encoded):
	X,y = list(),list()
	for i in range(1, len(encoded)):
		X.append(encoded[i-1])
		y.append(encoded[i])
	return X, y

seq1 = [3, 0, 1, 2, 3]
encoded = encode(seq1, 5)
X, y = to_xy_pairs(encoded)
for i in range(len(X)):
	print(X[i], y[i])

12345678910111213141516171819202122

# binary encode an input pattern, return a list of binary vectorsdef encode(pattern,n_unique):encoded=list()forvalue inpattern:row=[0.0forxinrange(n_unique)]row[value]=1.0encoded.append(row)returnencoded# create input/output pairs of encoded vectors, returns X, ydef to_xy_pairs(encoded):X,y=list(),list()foriinrange(1,len(encoded)):X.append(encoded[i-1])y.append(encoded[i])returnX,yseq1=[3,0,1,2,3]encoded=encode(seq1,5)X,y=to_xy_pairs(encoded)foriinrange(len(X)):print(X[i],y[i])

Running the example prints the input and output pairs for each step in the sequence.

[0.0, 0.0, 0.0, 1.0, 0.0] [1.0, 0.0, 0.0, 0.0, 0.0]
[1.0, 0.0, 0.0, 0.0, 0.0] [0.0, 1.0, 0.0, 0.0, 0.0]
[0.0, 1.0, 0.0, 0.0, 0.0] [0.0, 0.0, 1.0, 0.0, 0.0]
[0.0, 0.0, 1.0, 0.0, 0.0] [0.0, 0.0, 0.0, 1.0, 0.0]

1234	[0.0, 0.0, 0.0, 1.0, 0.0] [1.0, 0.0, 0.0, 0.0, 0.0][1.0, 0.0, 0.0, 0.0, 0.0] [0.0, 1.0, 0.0, 0.0, 0.0][0.0, 1.0, 0.0, 0.0, 0.0] [0.0, 0.0, 1.0, 0.0, 0.0][0.0, 0.0, 1.0, 0.0, 0.0] [0.0, 0.0, 0.0, 1.0, 0.0]

Reshape Data

The final step is to reshape the data so that it can be used by the LSTM network directly.

The Keras LSTM expects input patterns (X) as a three-dimensional NumPy array with the dimensions [samples, timesteps, features].

In the case of one sequence of input data, the dimensions will be [4, 1, 5] because we have 4 rows of data, 1 time step for each row, and 5 columns in each row.

We can create a 2D NumPy array from our list of X patterns, then reshape it into the required 3D format. For example:

df = DataFrame(X)
values = df.values
array = values.reshape(4, 1, 5)

123	df=DataFrame(X)values=df.valuesarray=values.reshape(4,1,5)

We must also convert the list of output patterns (y) into a 2D NumPy Array.

Below is a function named to_lstm_dataset() that takes a sequence as an input and the size of the sequence alphabet and returns an X and y dataset ready for use with an LSTM. It performs the required conversions of the sequence to a one-hot encoding and to input-output pairs before reshaping the data.

# convert sequence to x/y pairs ready for use with an LSTM
def to_lstm_dataset(sequence, n_unique):
	# one hot encode
	encoded = encode(sequence, n_unique)
	# convert to in/out patterns
	X,y = to_xy_pairs(encoded)
	# convert to LSTM friendly format
	dfX, dfy = DataFrame(X), DataFrame(y)
	lstmX = dfX.values
	lstmX = lstmX.reshape(lstmX.shape[0], 1, lstmX.shape[1])
	lstmY = dfy.values
	return lstmX, lstmY

123456789101112

# convert sequence to x/y pairs ready for use with an LSTMdef to_lstm_dataset(sequence,n_unique):# one hot encodeencoded=encode(sequence,n_unique)# convert to in/out patternsX,y=to_xy_pairs(encoded)# convert to LSTM friendly formatdfX,dfy=DataFrame(X),DataFrame(y)lstmX=dfX.valueslstmX=lstmX.reshape(lstmX.shape[0],1,lstmX.shape[1])lstmY=dfy.valuesreturnlstmX,lstmY

This function can be called with each sequence as follows:

seq1 = [3, 0, 1, 2, 3]
seq2 = [4, 0, 1, 2, 4]
n_unique = len(set(seq1 + seq2))

seq1X, seq1Y = to_lstm_dataset(seq1, n_unique)
seq2X, seq2Y = to_lstm_dataset(seq2, n_unique)

123456

seq1=[3,0,1,2,3]seq2=[4,0,1,2,4]n_unique=len(set(seq1+seq2))seq1X,seq1Y=to_lstm_dataset(seq1,n_unique)seq2X,seq2Y=to_lstm_dataset(seq2,n_unique)

We now have all of the pieces to prepare the data for the LSTM.

Learn Sequences with an LSTM

In this section, we will define the LSTM to learn the input sequences.

This section is divided into 4 sections:

LSTM Configuration
LSTM Training
LSTM Evaluation
LSTM Complete Example

LSTM Configuration

We want the LSTM to make one-step predictions, which we have defined in the format and shape of our dataset. We also want the LSTM to be updated with errors after each time step, this means we will need to use a batch-size of one.

Keras LSTMs are not stateful between batches by default. We can make them stateful by setting the stateful argument on the LSTM layer to True and managing the training epochs manually to ensure that the internal state of the LSTM is reset after each sequence.

We must define the shape of the batch using the batch_input_shape argument with 3 dimensions [batch size, time steps, and features] which will be 1, 1, and 5 respectively.

The network topology will be configured with one hidden LSTM layer with 20 units and a normal Dense layer with 5 outputs for each of the 5 columns in an output pattern. A sigmoid (logistic) activation function will be used on the output layer because of the binary outputs and the default tanh (hyperbolic tangent) activation function will be used on the LSTM layer.

A log (cross entropy) loss function will be optimized when fitting the network because of the binary outputs and the efficient ADAM optimization algorithm will be used with all default parameters.

The Keras code to define the LSTM network for this problem is listed below.

model = Sequential()
model.add(LSTM(20, batch_input_shape=(1, 1, 5), stateful=True))
model.add(Dense(5, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')

1234	model=Sequential()model.add(LSTM(20,batch_input_shape=(1,1,5),stateful=True))model.add(Dense(5,activation='sigmoid'))model.compile(loss='binary_crossentropy',optimizer='adam')

LSTM Training

We must fit the model manually one epoch at a time.

Within one epoch we can fit the model on each sequence, being sure to reset state after each sequence.

The model does not need to be trained for long given the simplicity of the problem; in this case only 250 epochs are required.

Below is an example of how the model can be fit on each sequence across all epochs.

# train LSTM
for i in range(250):
	model.fit(seq1X, seq1Y, epochs=1, batch_size=1, verbose=1, shuffle=False)
	model.reset_states()
	model.fit(seq2X, seq2Y, epochs=1, batch_size=1, verbose=0, shuffle=False)
	model.reset_states()

123456

# train LSTMforiinrange(250):model.fit(seq1X,seq1Y,epochs=1,batch_size=1,verbose=1,shuffle=False)model

相關推薦

Demonstration of Memory with a Long Short

Tweet Share Share Google Plus Long Short-Term Memory (LSTM) networks are a type of recurrent neu

uva13216 Problem with a ridiculously long name but with a ridiculously short description

題意是說，輸入整數n，1<=n<=10^1000，求66^n mod 100的餘數。一看這類題目，肯定是找規律。計算了前12個數，規律就出現了：1，66，56，96，36，76，16，56，96，36，76，16……所以，計算這個餘數就是僅僅跟n有關了。

2018 ICPC 徐州區域賽 H Rikka with A Long Colour Palette

題意：給出n個數軸上的線段，進行每個線段染一種顏色，求混合有k種顏色的距離的和。題解：如果某一段被k條及以上線段覆蓋，那麼這一段一定是滿足條件的，問題是如何求解方案數。確定一條線段染什麼顏色一定是根據左右端點判斷得到的，所以我們只關心端點。將左

NPoco--- fork of PetaPoco with a handful of extra features.

Simple microORM that maps the results of a query onto a POCO object. Project based on Schotime's b

(zhuan) Attention in Long Short-Term Memory Recurrent Neural Networks

have step points degree paper exc issues arr decision Attention in Long Short-Term Memory Recurrent Neural Networks by Jason Brownlee on

3% of users browse with IE9 and 14% of users have a disability. Why do we only cater for the former?

我的網站作品 form 我不 post ability img gpo 想要我想要用一個否定聲明來開始我的文章：對於怎樣創造一個易於用戶體驗的站點，我也不是了解非常多。讓作為一個資深開發人員的我操心的是，我在並沒有獲得太多關於這個主題（指怎樣創造一個

Failed to load resource: the server responded with a status of 404 (Not Found)

PE nbsp ava col AI div short rtc not Failed to load resource: the server responded with a status of 404 (Not Found) 報錯情況：圖標加載失敗原因分析：路徑錯誤

Xamarin.Android 使用 SQLite 出現 Index -1 requested, with a size of 10 異常

查詢 else bubuko local 獲取 roi sum 圖片 next() 異常: Android.Database.CursorIndexOutOfBoundsException: Index -1 requested, with a size of 10 此錯

android.database.CursorIndexOutOfBoundsException: Index 0 requested, with a size of 0

android.database.CursorIndexOutOfBoundsException: Index 0 requested, with a size of 0 在查資料庫的時候出現上述異常檢視sql語句： //獲取所選圖片的faceFeature

CNN Long Short-Term Memory

model = Sequential() # define CNN model model.add(TimeDistributed(Conv2D(...)) model.add(TimeDistributed(MaxPooling2D(...))) model.add(TimeDi

RNN--長短期記憶(Long Short Term Memory, LSTM)

長短期記憶(Long Short Term Memory, LSTM) 是一種 RNN 特殊的型別，可以學習長期依賴資訊。記住長期的資訊在實踐中是 LSTM 的預設行為，而非需要付出很大代價才能獲得的能力！ LSTM 單元和普通 RNN 單元的區別在標準的 RNN 中，

READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification

https://aclanthology.info/pdf/W/W11/W11-2308.pdf 2 background2000年以前 ----傳統可讀性準則侷限於表面的文字特徵，例如the Flesch-Kincaid measure（現在還在用的最普遍的）是每個單詞的平均音節數和每個句

1 TypeError: Index(...) must be called with a collection of some kind, ' ' was passed columns

今天犯了這個錯誤，查到的解決方法如下 columns : Index or array-like Column labels to use for resulting frame. Will default to np.arange(n) if no column labels

Android問題：報錯Index -1 requested, with a size of 1

使用Cursor使，讀取裡面的資料用到getColumnIndex()時報錯： Index -1 requested, with a size of 1 仔細閱讀過C

非監督特徵學習與深度學習（十五）--------長短記憶（Long Short Term Memory，LSTM）

LSTM LSTM概述長短記憶(Long Short Term Memory,LSTM)是一種 RNN 特殊的型別，可以學習長期依賴資訊,它引入了自迴圈的巧妙構思，以產生梯度長時間持續流動的路徑，解決RNN梯度消失或爆炸的問題。在手寫識別、

【論文筆記】用形狀做擋風玻璃上的雨滴檢測《Detection Of Raindrop With Various Shapes On A Windshield》

《Detection of Raindrop with Various Shapes on a Windshield》 1 介紹 2 雨滴檢測方法在白天和夜晚使用不同的演算法。通過整幅影象的強度水平判斷是白天還是夜晚。 2.1 白天的雨滴檢測方法這個方法假設

前端頁面the server responded with a status of 500 ()的非典型性BUG

非典型的,在使用ssm框架操作資料庫的時候報錯500和NullPointerException 前端頁面報錯:Failed to load resource: the server responded with a status of 500 () 後臺控制器

ajax使用時的錯誤the server responded with a status of 404 (Not Found)

使用ajax跳轉方法時，頁面ctrl+shift+i除錯報告了一個404錯誤，說找不到方法。頁面位址列直接指向方法的地址跳轉也是404。目標方法是新增的，於是使用複製黏貼，確定各處方法名稱一致之後，重啟server。除錯時再次報錯： jquery.min.js Fail

Failed to load resource: the server responded with a status of 500 (Internal Server Error)

【原因】因為這個表和另一個表是有一對多關係的,當序列化表1的時候,會找到和另一個表2關聯的欄位,就會到另一個表2中序列化,然後另一個表2中也有一個欄位和表1相關聯.這樣.序列化就會發生這種錯誤! 【解決方案】 /注：這裡值得注意的是，當有外來鍵向關聯時，必須要指定序列化元素，如果沒有外來鍵相關聯，直接序

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks-paper

1 introduction Most models for distributed representations of phrases and sentences—that is, models where realvalued vectors are u

搜尋

基礎教學

Mysql入門 Sql入門 Android入門 Docker入門 Go語言入門 Ruby程式入門 Python入門 Python進階 Django入門 Python爬蟲入門

最近訪問

首頁
前端設計
程式設計
免費資源
實用技巧
資料庫
資訊
字典

Copyright © 2002-2020 程式人生 796T.COM All rights reserved.