在迴圈神經網路中(RNN),有一條單向流動的資訊流是從輸入單元到達隱含單元的,與此同時,另一條單向流動的資訊流是從隱含單元到達輸出單元。在某些情況下,迴圈神經網路(RNN)會打破後者的限制,引導資訊從輸出單元返回隱含單元,這些被稱為“Back projections”,並且隱藏層的輸入還包括上一層隱藏層的輸出,即隱藏層內的節點是可以自連也可以互連的。



1)xt 表示第t(t=1, 2, 3, … )步的輸入

2)st 為隱含層第t步的狀態,它是神經的記憶單元。st 根據當前輸入層和上一步隱含層的輸出進行計算
st=f(Uxt+Wst−1) ,其中f一般為非線性的啟用函式,如relu或tanh,在計算s0 時,即第一個的隱含狀態,需要用到 st−1,但其並不存在,在現實中一般被設定為0向量。

3) ot 是第t步的輸出,stot=softmax(Vst)


隱含層狀態st 被認為是網路的記憶單元。st 包含了前面所有步的隱含層狀態,而輸出層的ot只與當前步的st有關。在實踐中,為了降低網路的複雜度,往往st只包含前面若干步而不是所有步的隱含層輸出。

import tensorflow as tf
from tensorflow.contrib import rnn
import numpy as np

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(“/tmp/data/”, one_hot=True)

To classify images using a bidirectional recurrent neural network, we consider
every image row as a sequence of pixels. Because MNIST image shape is 28*28px,
we will then handle 28 sequences of 28 steps for every sample.

learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

n_input = 28 # MNIST data input (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST total classes (0-9 digits)

x = tf.placeholder(“float”, [None, n_steps, n_input])
y = tf.placeholder(“float”, [None, n_classes])

weights = {
# Hidden layer weights => 2*n_hidden because of forward + backward cells
‘out’: tf.Variable(tf.random_normal([2*n_hidden, n_classes]))
biases = {
‘out’: tf.Variable(tf.random_normal([n_classes]))

def BiRNN(x, weights, biases):

# Prepare data shape to match `bidirectional_rnn` function requirements
# Current data input shape: (batch_size, n_steps, n_input)
# Required shape: 'n_steps' tensors list of shape (batch_size, n_input)

# Unstack to get a list of 'n_steps' tensors of shape (batch_size, n_input)
x = tf.unstack(x, n_steps, 1)

# Define lstm cells with tensorflow
# Forward direction cell
lstm_fw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)
# Backward direction cell
lstm_bw_cell = rnn.BasicLSTMCell(n_hidden, forget_bias=1.0)

# Get lstm cell output
    outputs, _, _ = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,
except Exception: # Old TensorFlow version only returns outputs not states
    outputs = rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,

# Linear activation, using rnn inner loop last output
return tf.matmul(outputs[-1], weights['out']) + biases['out']

pred = BiRNN(x, weights, biases)

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

init = tf.global_variables_initializer()

with tf.Session() as sess:
step = 1
# Keep training until reach max iterations
while step * batch_size < training_iters:
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Reshape data to get 28 seq of 28 elements
batch_x = batch_x.reshape((batch_size, n_steps, n_input))
# Run optimization op (backprop)
sess.run(optimizer, feed_dict={x: batch_x, y: batch_y})
if step % display_step == 0:
# Calculate batch accuracy
acc = sess.run(accuracy, feed_dict={x: batch_x, y: batch_y})
# Calculate batch loss
loss = sess.run(cost, feed_dict={x: batch_x, y: batch_y})
print(“Iter ” + str(step*batch_size) + “, Minibatch Loss= ” + \
“{:.6f}”.format(loss) + “, Training Accuracy= ” + \
step += 1
print(“Optimization Finished!”)

# Calculate accuracy for 128 mnist test images
test_len = 128
test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
test_label = mnist.test.labels[:test_len]
print("Testing Accuracy:", \
    sess.run(accuracy, feed_dict={x: test_data, y: test_label}))