1. 程式人生 > >Demonstration of Memory with a Long Short

Demonstration of Memory with a Long Short

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning over long sequences.

This differentiates them from regular multilayer neural networks that do not have memory and can only learn a mapping between input and output patterns.

It is important to understand the capabilities of complex neural networks like LSTMs on small contrived problems as this understanding will help you scale the network up to large and even very large problems.

In this tutorial, you will discover the capability of LSTMs to remember and recall.

After completing this tutorial, you will know:

  • How to define a small sequence prediction problem that only an RNN like LSTMs can solve using memory.
  • How to transform the problem representation so that it is suitable for learning by LSTMs.
  • How to design an LSTM to solve the problem correctly.

Let’s get started.

A Demonstration of Memory in a Long Short-Term Memory Network

A Demonstration of Memory in a Long Short-Term Memory Network
Photo by

crazlei, some rights reserved.

Environment

This tutorial assumes you have a working Python 2 or 3 environment with SciPy, Keras 2.0 or higher with a TensorFlow or Theano backend.

For help setting up your Python environment, see the post:

Sequence Problem Description

The problem is to predict values of a sequence one at a time.

Given one value in the sequence, the model must predict the next value in the sequence. For example, given a value of “0” as an input, the model must predict the value “1”.

There are two different sequences that the model must learn and correctly predict.

A wrinkle is that there is conflicting information between the two sequences and that the model must know the context of each one-step prediction (e.g. the sequence it is currently predicting) in order to correctly predict each full sequence.

This wrinkle is important to prevent the model from memorizing each single-step input-output pair of values in each sequence, as a sequence unaware model may be inclined to do.

The two sequences to be learned are as follows:

  • 3, 0, 1, 2, 3
  • 4, 0, 1, 2, 4

We can see that the first value of the sequence is repeated as the last value of the sequence. This is the indicator that provides context to the model as to which sequence it is working on.

The conflict is the transition from the second to last items in each sequence. In sequence one, a “2” is given as an input and a “3” must be predicted, whereas in sequence two, a “2” is given as input and a “4” must be predicted.

This is a problem that a multilayer Perceptron and other non-recurrent neural networks cannot learn.

This is a simplified version of “Experiment 2” used to demonstrate LSTM long-term memory capabilities in Hochreiter and Schmidhuber’s 1997 paper Long Short Term Memory (PDF).

Need help with LSTMs for Sequence Prediction?

Take my free 7-day email course and discover 6 different LSTM architectures (with code).

Click to sign-up and also get a free PDF Ebook version of the course.

Problem Representation

This section is divided into 3 parts; they are:

  1. One Hot Encoding
  2. Input-Output Pairs
  3. Reshape Data

One Hot Encoding

We will use a one hot encoding to represent the learning problem for the LSTM.

That is, each input and output value will be represented as a binary vector with 5 elements, because the alphabet of the problem is 5 unique values.

For example, the 5 values of [0, 1, 2, 3, 4] are represented as the following 5 binary vectors:

12345 0: [1, 0, 0, 0, 0]1: [0, 1, 0, 0, 0]2: [0, 0, 1, 0, 0]3: [0, 0, 0, 1, 0]4: [0, 0, 0, 0, 1]

We can do this with a simple function that will take a sequence and return a list of binary vectors for each value in the sequence. The function encode() below implements this behavior.

12345678 # binary encode an input pattern, return a list of binary vectorsdef encode(pattern,n_unique):encoded=list()forvalue inpattern:row=[0.0forxinrange(n_unique)]row[value]=1.0encoded.append(row)returnencoded

We can test it on the first sequence and print the resulting list of binary vectors. The complete example is listed below.

12345678910111213 # binary encode an input pattern, return a list of binary vectorsdef encode(pattern,n_unique):encoded=list()forvalue inpattern:row=[0.0forxinrange(n_unique)]row[value]=1.0encoded.append(row)returnencodedseq1=[3,0,1,2,3]encoded=encode(seq1,5)forvector inencoded:print(vector)

Running the example prints each binary vector. Note that we use the floating point values 0.0 and 1.0 because they will be used as inputs and outputs for the model.

12345 [0.0, 0.0, 0.0, 1.0, 0.0][1.0, 0.0, 0.0, 0.0, 0.0][0.0, 1.0, 0.0, 0.0, 0.0][0.0, 0.0, 1.0, 0.0, 0.0][0.0, 0.0, 0.0, 1.0, 0.0]

Input-Output Pairs

The next step is to split a sequence of encoded values into input-output pairs.

This is a supervised learning representation of the problem such that machine learning problems can learn how to map an input pattern (X) to an output pattern (y).

For example, the first sequence has the following input-output pairs to be learned:

12345 X, y3, 00, 11, 22, 3

Instead of the raw numbers, we must create these mapping pairs from the one hot encoded binary vectors.

For example, the first input-output pairs for 3->0 would be:

12 X, y[0, 0, 0, 1, 0] [1, 0, 0, 0, 0]

Below is a function named to_xy_pairs() that will create lists of X and y patterns given a list of encoded binary vectors.

1234567 # create input/output pairs of encoded vectors, returns X, ydef to_xy_pairs(encoded):X,y=list(),list()foriinrange(1,len(encoded)):X.append(encoded[i-1])y.append(encoded[i])returnX,y

We can put this together with the one hot encoding function above and print the encoded input and output pairs for the first sequence.

12345678910111213141516171819202122 # binary encode an input pattern, return a list of binary vectorsdef encode(pattern,n_unique):encoded=list()forvalue inpattern:row=[0.0forxinrange(n_unique)]row[value]=1.0encoded.append(row)returnencoded# create input/output pairs of encoded vectors, returns X, ydef to_xy_pairs(encoded):X,y=list(),list()foriinrange(1,len(encoded)):X.append(encoded[i-1])y.append(encoded[i])returnX,yseq1=[3,0,1,2,3]encoded=encode(seq1,5)X,y=to_xy_pairs(encoded)foriinrange(len(X)):print(X[i],y[i])

Running the example prints the input and output pairs for each step in the sequence.

1234 [0.0, 0.0, 0.0, 1.0, 0.0] [1.0, 0.0, 0.0, 0.0, 0.0][1.0, 0.0, 0.0, 0.0, 0.0] [0.0, 1.0, 0.0, 0.0, 0.0][0.0, 1.0, 0.0, 0.0, 0.0] [0.0, 0.0, 1.0, 0.0, 0.0][0.0, 0.0, 1.0, 0.0, 0.0] [0.0, 0.0, 0.0, 1.0, 0.0]

Reshape Data

The final step is to reshape the data so that it can be used by the LSTM network directly.

The Keras LSTM expects input patterns (X) as a three-dimensional NumPy array with the dimensions [samples, timesteps, features].

In the case of one sequence of input data, the dimensions will be [4, 1, 5] because we have 4 rows of data, 1 time step for each row, and 5 columns in each row.

We can create a 2D NumPy array from our list of X patterns, then reshape it into the required 3D format. For example:

123 df=DataFrame(X)values=df.valuesarray=values.reshape(4,1,5)

We must also convert the list of output patterns (y) into a 2D NumPy Array.

Below is a function named to_lstm_dataset() that takes a sequence as an input and the size of the sequence alphabet and returns an X and y dataset ready for use with an LSTM. It performs the required conversions of the sequence to a one-hot encoding and to input-output pairs before reshaping the data.

123456789101112 # convert sequence to x/y pairs ready for use with an LSTMdef to_lstm_dataset(sequence,n_unique):# one hot encodeencoded=encode(sequence,n_unique)# convert to in/out patternsX,y=to_xy_pairs(encoded)# convert to LSTM friendly formatdfX,dfy=DataFrame(X),DataFrame(y)lstmX=dfX.valueslstmX=lstmX.reshape(lstmX.shape[0],1,lstmX.shape[1])lstmY=dfy.valuesreturnlstmX,lstmY

This function can be called with each sequence as follows:

123456 seq1=[3,0,1,2,3]seq2=[4,0,1,2,4]n_unique=len(set(seq1+seq2))seq1X,seq1Y=to_lstm_dataset(seq1,n_unique)seq2X,seq2Y=to_lstm_dataset(seq2,n_unique)

We now have all of the pieces to prepare the data for the LSTM.

Learn Sequences with an LSTM

In this section, we will define the LSTM to learn the input sequences.

This section is divided into 4 sections:

  1. LSTM Configuration
  2. LSTM Training
  3. LSTM Evaluation
  4. LSTM Complete Example

LSTM Configuration

We want the LSTM to make one-step predictions, which we have defined in the format and shape of our dataset. We also want the LSTM to be updated with errors after each time step, this means we will need to use a batch-size of one.

Keras LSTMs are not stateful between batches by default. We can make them stateful by setting the stateful argument on the LSTM layer to True and managing the training epochs manually to ensure that the internal state of the LSTM is reset after each sequence.

We must define the shape of the batch using the batch_input_shape argument with 3 dimensions [batch size, time steps, and features] which will be 1, 1, and 5 respectively.

The network topology will be configured with one hidden LSTM layer with 20 units and a normal Dense layer with 5 outputs for each of the 5 columns in an output pattern. A sigmoid (logistic) activation function will be used on the output layer because of the binary outputs and the default tanh (hyperbolic tangent) activation function will be used on the LSTM layer.

A log (cross entropy) loss function will be optimized when fitting the network because of the binary outputs and the efficient ADAM optimization algorithm will be used with all default parameters.

The Keras code to define the LSTM network for this problem is listed below.

1234 model=Sequential()model.add(LSTM(20,batch_input_shape=(1,1,5),stateful=True))model.add(Dense(5,activation='sigmoid'))model.compile(loss='binary_crossentropy',optimizer='adam')

LSTM Training

We must fit the model manually one epoch at a time.

Within one epoch we can fit the model on each sequence, being sure to reset state after each sequence.

The model does not need to be trained for long given the simplicity of the problem; in this case only 250 epochs are required.

Below is an example of how the model can be fit on each sequence across all epochs.

123456 # train LSTMforiinrange(250):model.fit(seq1X,seq1Y,epochs=1,batch_size=1,verbose=1,shuffle=False)model