TensorFlow 實現多層 LSTM 的 MNIST 分類 + 視覺化

阿新 • • 發佈：2019-01-02

前言

迴圈神經網路（recurrent neural networks, RNNs）及其改進演算法長短期記憶網路（Long Short-Term Memory, LSTM）能夠很好地對時序資料進行建模，其的相關基礎不進行介紹，需要了解可以參考以下文章：
Understanding LSTM Networks
RNN快速入門
 YJango的迴圈神經網路——實現LSTM
莫煩 PYTHON：什麼是迴圈神經網路 RNN

RNNs 展開示意圖：
RNNs

LSTM 結構示意圖：
這裡寫圖片描述

TensorFlow 實現

採用兩層的 LSTM 實現對 MNIST 手寫數字進行分類，並對訓練過程中的誤差和準確率進行 tensorboard 的視覺化。

1. 初始化引數

這裡 mnist 影象尺寸是 28*28 的，可以看作時序長度 28（影象的寬），輸入為 28（影象的高）

# Hyper Parameters
learning_rate = 0.01    # 學習率
n_steps = 28            # LSTM 展開步數（時序持續長度）
n_inputs = 28           # 輸入節點數
n_hiddens = 64         # 隱層節點數
n_layers = 2            # LSTM layer 層數
n_classes = 10          # 輸出節點數（分類數目）

2. 定義輸入輸出的 placeholder

# tensor placeholder
with tf.name_scope('inputs'):
    x = tf.placeholder(tf.float32, [None, n_steps * n_inputs], name='x_input')     # 輸入
    y = tf.placeholder(tf.float32, [None, n_classes], name='y_input')               # 輸出
    keep_prob = tf.placeholder(tf.float32, name='keep_prob_input' 
)           # 保持多少不被 dropout
    batch_size = tf.placeholder(tf.int32, [], name='batch_size_input')       # 批大小

3. 定義網路的權重和偏置

# weights and biases
with tf.name_scope('weights'):
    Weights = tf.Variable(tf.truncated_normal([n_hiddens, n_classes],stddev=0.1), dtype=tf.float32, name='W')
    tf.summary.histogram('output_layer_weights', Weights)
with tf.name_scope('biases'):
    biases = tf.Variable(tf.random_normal([n_classes]), name='b')
    tf.summary.histogram('output_layer_biases', biases)

4. RNN 網路結構

# RNN structure
def RNN_LSTM(x, Weights, biases):
    # RNN 輸入 reshape
    x = tf.reshape(x, [-1, n_steps, n_inputs])
    # 定義 LSTM cell
    # cell 中的 dropout
    def attn_cell():
        lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hiddens)
        with tf.name_scope('lstm_dropout'):
            return tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
    # attn_cell = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
    # 實現多層 LSTM
    # [attn_cell() for _ in range(n_layers)]
    enc_cells = []
    for i in range(0, n_layers):
        enc_cells.append(attn_cell())
    with tf.name_scope('lstm_cells_layers'):
        mlstm_cell = tf.contrib.rnn.MultiRNNCell(enc_cells, state_is_tuple=True)
    # 全零初始化 state
    _init_state = mlstm_cell.zero_state(batch_size, dtype=tf.float32)
    # dynamic_rnn 執行網路
    outputs, states = tf.nn.dynamic_rnn(mlstm_cell, x, initial_state=_init_state, dtype=tf.float32, time_major=False)
    # 輸出
    #return tf.matmul(outputs[:,-1,:], Weights) + biases
    return tf.nn.softmax(tf.matmul(outputs[:,-1,:], Weights) + biases)

5. 損失函式和優化器

with tf.name_scope('output_layer'):
    pred = RNN_LSTM(x, Weights, biases)
    tf.summary.histogram('outputs', pred)
# cost
with tf.name_scope('loss'):
    #cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
    cost = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred),reduction_indices=[1]))
    tf.summary.scalar('loss', cost)
# optimizer
with tf.name_scope('train'):
    train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# accuarcy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
with tf.name_scope('accuracy'):
    accuracy = tf.metrics.accuracy(labels=tf.argmax(y, axis=1), predictions=tf.argmax(pred, axis=1))[1]
    tf.summary.scalar('accuracy', accuracy)

merged = tf.summary.merge_all()

init = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())

6. 訓練

with tf.Session() as sess:
    sess.run(init)
    train_writer = tf.summary.FileWriter("E://logs//train",sess.graph)
    test_writer = tf.summary.FileWriter("E://logs//test",sess.graph)
    # training
    step = 1
    for i in range(2000):
        _batch_size = 128
        batch_x, batch_y = mnist.train.next_batch(_batch_size)

        sess.run(train_op, feed_dict={x:batch_x, y:batch_y, keep_prob:0.5, batch_size:_batch_size})
        if (i + 1) % 100 == 0:
            train_result = sess.run(merged, feed_dict={x:batch_x, y:batch_y, keep_prob:1.0, batch_size:_batch_size})
            test_result = sess.run(merged, feed_dict={x:test_x, y:test_y, keep_prob:1.0, batch_size:test_x.shape[0]})
            train_writer.add_summary(train_result,i+1)
            test_writer.add_summary(test_result,i+1)

    print("Optimization Finished!")

7. 預測

    test_x = mnist.test.images
    test_y = mnist.test.labels
    # prediction
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x:test_x, y:test_y, keep_prob:1.0, batch_size:test_x.shape[0]}))

視覺化結果

訓練集和測試集的在訓練過程中的誤差變化對比：
這裡寫圖片描述

訓練集和測試集的在訓練過程中的預測準確率對比：
這裡寫圖片描述

附全部程式碼

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

tf.reset_default_graph()

# Hyper Parameters
learning_rate = 0.01    # 學習率
n_steps = 28            # LSTM 展開步數（時序持續長度）
n_inputs = 28           # 輸入節點數
n_hiddens = 64         # 隱層節點數
n_layers = 2            # LSTM layer 層數
n_classes = 10          # 輸出節點數（分類數目）

# data
mnist = input_data.read_data_sets("E:/Anaconda3/workspace/MNIST_data/", one_hot=True)
test_x = mnist.test.images
test_y = mnist.test.labels

# tensor placeholder
with tf.name_scope('inputs'):
    x = tf.placeholder(tf.float32, [None, n_steps * n_inputs], name='x_input')     # 輸入
    y = tf.placeholder(tf.float32, [None, n_classes], name='y_input')               # 輸出
    keep_prob = tf.placeholder(tf.float32, name='keep_prob_input')           # 保持多少不被 dropout
    batch_size = tf.placeholder(tf.int32, [], name='batch_size_input')       # 批大小

# weights and biases
with tf.name_scope('weights'):
    Weights = tf.Variable(tf.truncated_normal([n_hiddens, n_classes],stddev=0.1), dtype=tf.float32, name='W')
    tf.summary.histogram('output_layer_weights', Weights)
with tf.name_scope('biases'):
    biases = tf.Variable(tf.random_normal([n_classes]), name='b')
    tf.summary.histogram('output_layer_biases', biases)

# RNN structure
def RNN_LSTM(x, Weights, biases):
    # RNN 輸入 reshape
    x = tf.reshape(x, [-1, n_steps, n_inputs])
    # 定義 LSTM cell
    # cell 中的 dropout
    def attn_cell():
        lstm_cell = tf.contrib.rnn.BasicLSTMCell(n_hiddens)
        with tf.name_scope('lstm_dropout'):
            return tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
    # attn_cell = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=keep_prob)
    # 實現多層 LSTM
    # [attn_cell() for _ in range(n_layers)]
    enc_cells = []
    for i in range(0, n_layers):
        enc_cells.append(attn_cell())
    with tf.name_scope('lstm_cells_layers'):
        mlstm_cell = tf.contrib.rnn.MultiRNNCell(enc_cells, state_is_tuple=True)
    # 全零初始化 state
    _init_state = mlstm_cell.zero_state(batch_size, dtype=tf.float32)
    # dynamic_rnn 執行網路
    outputs, states = tf.nn.dynamic_rnn(mlstm_cell, x, initial_state=_init_state, dtype=tf.float32, time_major=False)
    # 輸出
    #return tf.matmul(outputs[:,-1,:], Weights) + biases
    return tf.nn.softmax(tf.matmul(outputs[:,-1,:], Weights) + biases)

with tf.name_scope('output_layer'):
    pred = RNN_LSTM(x, Weights, biases)
    tf.summary.histogram('outputs', pred)
# cost
with tf.name_scope('loss'):
    #cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
    cost = tf.reduce_mean(-tf.reduce_sum(y * tf.log(pred),reduction_indices=[1]))
    tf.summary.scalar('loss', cost)
# optimizer
with tf.name_scope('train'):
    train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# accuarcy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
with tf.name_scope('accuracy'):
    accuracy = tf.metrics.accuracy(labels=tf.argmax(y, axis=1), predictions=tf.argmax(pred, axis=1))[1]
    tf.summary.scalar('accuracy', accuracy)

merged = tf.summary.merge_all()

init = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())

with tf.Session() as sess:
    sess.run(init)
    train_writer = tf.summary.FileWriter("E://logs//train",sess.graph)
    test_writer = tf.summary.FileWriter("E://logs//test",sess.graph)
    # training
    step = 1
    for i in range(2000):
        _batch_size = 128
        batch_x, batch_y = mnist.train.next_batch(_batch_size)

        sess.run(train_op, feed_dict={x:batch_x, y:batch_y, keep_prob:0.5, batch_size:_batch_size})
        if (i + 1) % 100 == 0:
            #loss = sess.run(cost, feed_dict={x:batch_x, y:batch_y, keep_prob:1.0, batch_size:_batch_size})
            #acc = sess.run(accuracy, feed_dict={x:batch_x, y:batch_y, keep_prob:1.0, batch_size:_batch_size})
            #print('Iter: %d' % ((i+1) * _batch_size), '| train loss: %.6f' % loss, '| train accuracy: %.6f' % acc)
            train_result = sess.run(merged, feed_dict={x:batch_x, y:batch_y, keep_prob:1.0, batch_size:_batch_size})
            test_result = sess.run(merged, feed_dict={x:test_x, y:test_y, keep_prob:1.0, batch_size:test_x.shape[0]})
            train_writer.add_summary(train_result,i+1)
            test_writer.add_summary(test_result,i+1)

    print("Optimization Finished!")
    # prediction
    print("Testing Accuracy:", sess.run(accuracy, feed_dict={x:test_x, y:test_y, keep_prob:1.0, batch_size:test_x.shape[0]}))

TensorFlow 實現多層 LSTM 的 MNIST 分類 + 視覺化

前言迴圈神經網路（recurrent neural networks, RNNs）及其改進演算法長短期記憶網路（Long Short-Term Memory, LSTM）能夠很好地對時序資料進行建模，其的相關基礎不進行介紹，需要了解可以參考以下文章： Un

TensorFlow實現多層LSTM識別MNIST手寫字，多層LSTM下state和output的關係

其他內容輸入格式：batch_size*784改成batch_size*28*28,28個序列，內容是一行的28個灰度數值。讓神經網路逐行掃描一個手寫字型圖案，總結各行特徵，通過時間序列串聯起來，最終得出結論。網路定義：單獨定義一個獲取單元的函式，便於在M

tensorflow實現多層感知機進行手寫字識別

logits=multilayer_perceptron(X) #使用交叉熵損失 loss_op=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=Y))

tensorflow中多層lstm專案程式碼詳解（進度:1/4）

1. 專案地址 2. 專案資料使用text8.zip Linux下下載指令 curl http://mattmahoney.net/dc/text8.zip > text8.zip 3. 命令列執行指令 python3.5 ptb_word_lm.py

TensorFlow實戰（五）- 使用TensorFlow實現多層感知機

一，多層感知機簡介多層感知機也叫深度前饋網路或前饋神經網路。 “多層”本質上指的是多層隱含層，神經網路的非線性也主要體現在隱含層非線性的啟用函式上，理論上只要隱含節點足夠多，只要有一層隱含層就可以擬合任意函式，但隱含層越多，就越容易擬合複雜函式，並且

tensorflow實現多層感知機

在前面的部落格中我們已經討論過softmax實現分類的例子，該模型最大的特點是簡單易用，但是擬合能力不強。它和傳統意義上的神經網路的最大區別是沒有隱含層。對於神經網路來說，引入非線性隱含層後，理論上只要隱含節點足夠多，即使只有一個隱含層的神經網路也可以擬合任

用pytorch實現多層感知機（MLP)（全連線神經網路FC）分類MNIST手寫數字體的識別

1.匯入必備的包 1 import torch 2 import numpy as np 3 from torchvision.datasets import mnist 4 from torch import nn 5 from torch.autograd import Variable 6

tensorflow 多層感知機分類mnist

from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("/root/data/", one_hot=True) import tensorf

TensorFlow學習筆記（4）--實現多層感知機（MNIST資料集）

前面使用TensorFlow實現一個完整的Softmax Regression，並在MNIST資料及上取得了約92%的正確率。現在建含一個隱層的神經網路模型（多層感知機）。 import tensorflow as tf import numpy as np

gluon 實現多層感知機MLP分類FashionMNIST

from mxnet import gluon,init from mxnet.gluon import loss as gloss, nn from mxnet.gluon import data as gdata from mxnet import nd,autograd import gl

TensorFlow入門（五）多層 LSTM 通俗易懂版

歡迎轉載，但請務必註明原文出處及作者資訊。 @author: huangyongye @creat_date: 2017-03-09 前言: 根據我本人學習 TensorFlow 實現 LSTM 的經歷，發現網上雖然也有不少教程，其中很多都是根據官方

基於Tensorflow實現多分類支援向量機

1、匯入必要的程式設計庫； import matplotlib.pyplot as plt import numpy as np import tensorflow as tf from sklearn import datasets sess = tf.Se

【tensorflow】TensorFlow入門（五）多層 LSTM 通俗易懂版

前言: 根據我本人學習 TensorFlow 實現 LSTM 的經歷，發現網上雖然也有不少教程，其中很多都是根據官方給出的例子，用多層 LSTM 來實現 PTBModel 語言模型，比如： tensorflow筆記：多層LSTM程式碼分析但是感覺這些例子還是太複雜了

[原創]實現多層DIV疊加的js事件穿透

實現 pre 捕獲 top src 結果 fun 情況 pos Flash裏面有個很好的特性是，一個容器裏，不存在實際對象的部分，不會阻攔鼠標事件穿透到下一層。前端就不一樣了，兩個div層疊以後，上層div會接收到所有事件（即使這個div裏面內容是空的，沒有任何實際對象）

tensorflow實現貓狗大戰（分類算法）

sse sin output 行操作 ogr cast bytes 序列 raw 本次使用了tensorflow高級API在規範化網絡編程做出了嘗試。第一步：準備好需要的庫 tensorflow-gpu 1.8.0 opencv-python 3.3.1 nu

多層LSTM的坑：如何定義多層LSTM？

多層LSTM定義的兩種方式，其中方式2是正確的，方式1揮出現錯誤：多層LSTM需要每次都定義一個新的BasicCell, 而不是定義一個BasicCell之後多次呼叫。 def lstm_model(X,Y,is_training):

人工智慧（4）- 實現多層神經網路

1.單層神經網路 2.多層神經網路 3.MLP的3個步驟 MLP learning procedure in three simple steps: Starting at the input layer, we forward propagate the patt

長短期記憶（LSTM）系列_LSTM的建模方法（2）——如何堆疊多層LSTM網路

導讀：堆疊式LSTM屬於深度學習，通過新增網路的深度，提高訓練的效率，獲得更高的準確性。文中介紹了堆疊式LSTM的架構和實現方法在堆疊式LSTM中，層與層的輸數出通過return_sequences = True引數修改成3D資料，以便供下一層網路使用。為什麼要增加深度？

用單層感知器實現多個神經元的分類

訓練樣本矩陣： P = [0.1 0.7 0.8 0.8 1.0 0.3 0.0 –0.3 –0.5 –1.5; 1.2 1.8 1.6 0.6 0.8 0.5

實現多層DIV疊加的js事件穿透

Flash裡面有個很好的特性是，一個容器裡，不存在實際物件的部分，不會阻攔滑鼠事件穿透到下一層。前端就不一樣了，兩個div層疊以後，上層div會接收到所有事件（即使這個div裡面內容是空的，沒有任何實際物件），下層div什麼事件都接不到。舉個例子：這個示意圖

TensorFlow 實現多層 LSTM 的 MNIST 分類 + 視覺化

前言

TensorFlow 實現

1. 初始化引數

2. 定義輸入輸出的 placeholder

3. 定義網路的權重和偏置

4. RNN 網路結構

5. 損失函式和優化器

6. 訓練

7. 預測

視覺化結果

附全部程式碼

相關推薦