tensorflow11:雙隱層+softmax迴歸實現mnist圖片識別
概述
上篇文章講到的sofmax迴歸,除了輸入層,只有線性層+sofmax,這兩者合起來可以被稱為輸出層。沒有中間的隱藏層。
本文介紹在sofmax迴歸基礎上增加兩層隱藏層的方法。
本文的主要參考來自參考資料裡的《TensorFlow運作方式入門》和《TensorFlow實現雙隱層SoftMax Regression分類器》。
借用別人的一張圖,雙隱藏層的結構如下:
注意上圖W和b的下標是不對的,出處的作者也是偷懶,貼上拷貝組合成的新圖。
構建計算圖的過程
參考前篇文章,構建一個計算圖需要4個階段:
- Inference:構建前向預測節點
- Loss:構建損失節點
- Train:構建訓練節點
- Evaluate: 構建評估節點
tensorflow自帶例程tensorflow/examples/tutorials/mnist/mnist.py,裡面實現了上述4個階段,並且每個階段抽象為一個函式。原始的檔案是給同一目錄下的另一個檔案fully_connected_feed.py呼叫的。本文修改了mnist.py使之可獨立執行,沒有牽扯到fully_connected_feed.py。
構建模型計算圖講解
推理(Inference)
inference()函式會盡可能地構建圖表,最終返回包含了預測結果(output prediction)的Tensor。 本階段需要構建兩個隱藏層和一個“線性+softmax迴歸層”。
每一層都創建於一個唯一的tf.name_scope之下,創建於該作用域之下的所有元素都將帶有其字首。
with tf.name_scope('hidden2'):
在定義的作用域中,每一層所使用的權重和偏差都在使用tf.Variable生成,並且包含了各自期望的shape。
weights = tf.Variable(
tf.truncated_normal([hidden1_units, hidden2_units],
stddev=1.0 / math.sqrt(float(hidden1_units))),
name='weights')
biases = tf. Variable(tf.zeros([hidden2_units]), name='biases')
當這些層是在hidden2作用域下生成時,賦予權重變數的獨特名稱將會是"hidden2/weights".
通過tf.truncated_normal函式初始化權重變數, tf.truncated_normal初始函式將根據所得到的均值和標準差,生成一個隨機分佈。
多說一句 tf.truncated_normal,這個函式是一種截斷正太分佈初始化,也就是說它賦給變數的值的範圍不是無窮大,而是預設在兩倍標準差的範圍內。
然後,通過tf.zeros函式初始化偏差變數(biases),確保所有偏差的起始值都是0,而它們的shape則是其在該層中所接到的(connect to)單元數量。
其中兩個隱藏層後面還有一個Relu啟用函式:
hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
輸出層沒有啟用函式。輸出結果在做loss計算時用tf.nn.sparse_softmax_cross_entropy_with_logits做softmax計算。
推理建構函式返回的是輸出層最後輸出的tensor。
損失(Loss)
loss()函式通過新增所需的損失操作,進一步構建圖表。
新增一個tf.nn.softmax_cross_entropy_with_logits操作,用來比較inference()函式所輸出的logits Tensor softmax之後和lable的交叉熵,然後求平均值。
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=labels, logits=logits, name='xentropy')
return tf.reduce_mean(cross_entropy, name='xentropy_mean')
訓練 (Train)
training()函式添加了通過梯度下降(gradient descent)將損失最小化所需的操作。
例項化一個tf.train.GradientDescentOptimizer,負責按照所要求的學習效率(learning rate)應用梯度下降法(gradients),並使用minimize()函式更新系統中的權重,不斷地修改變數以降低成本。
# 根據給定的學習率建立梯度下降優化器
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
train_op = optimizer.minimize(loss=loss,global_step=global_step)
另外,為了能夠在tensorboard視覺化訓練過程,還增加了loss的彙總值,並在minimize過程中不斷更新global_step。
# 為儲存loss的值新增一個標量彙總(scalar summary).
tf.summary.scalar('loss', loss)
# 建立一個變數來跟蹤global step.
global_step = tf.Variable(0, name='global_step', trainable=False)
注意上面兩端程式碼順序是打亂的,為了分開表述才分開的,完整版看下面。
評估 (Evaluate)
計算每個批次樣本top-k結果的準確性(這裡取top1),並累計預測正確的樣本數量。
correct = tf.nn.in_top_k(logits, labels, 1)
# 返回 當前批次的樣本中預測正確的樣本數量.
return tf.reduce_sum(tf.cast(correct, tf.int32))
迴圈訓練模型
上述工作做完,就很容易得到各個模組的tensor,使之在session中run起來,提供正確的引數就可以了。
for step in range(1, num_steps+1):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop)
sess.run(train_op, feed_dict={images_placeholder: batch_x, labels_placeholder: batch_y})
每到一定的部署,呼叫一次loss和Evaluate節點評估當前的效果:
if step % display_step == 0 or step == 1:
# Calculate batch loss and accuracy
loss, acc = sess.run([batch_loss, correct_counts], feed_dict={images_placeholder: batch_x, labels_placeholder: batch_y})
print("Step " + str(step) + ", Minibatch Loss= " + \
"{:.4f}".format(loss) + ", Training Accuracy= " + \
"{:.3f}".format(float(acc)/batch_size))
所有訓練做完以後,還要在測試集上測試最終的效果:
test_acc = sess.run(correct_counts, feed_dict={images_placeholder: mnist.test.images,
labels_placeholder: mnist.test.labels})
print("Testing Accuracy:{:.3f}".format(float(test_acc)/len(mnist.test.images)))
完整程式碼
import math
import tensorflow as tf
# MNIST 有10個類, 表達了0到9的10個數字.
NUM_CLASSES = 10
# MNIST 中的影象都是 28x28 畫素,展開成784維的特徵向量
IMAGE_SIZE = 28
IMAGE_PIXELS = IMAGE_SIZE * IMAGE_SIZE
batch_size = 50 #每個批次的樣本數量
hidden1_units = 20 #第一個隱藏層的大小.
hidden2_units = 15 #第二個隱藏層的大小.
learning_rate = 0.1 #優化器的學習率
images_placeholder = tf.placeholder(tf.float32, shape=(None, IMAGE_PIXELS))
labels_placeholder = tf.placeholder(tf.int32, shape=(None))
#構建學習器模型的前向預測過程(從輸入到預測輸出的計算圖路徑)
def inference(images, hidden1_units, hidden2_units):
# Hidden 1:y1 = relu(x*W1 +b1)
with tf.name_scope('hidden1'):
weights = tf.Variable(
tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
name='weights')
biases = tf.Variable(tf.zeros([hidden1_units]), name='biases')
hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
# Hidden 2: y2 = relu(y1*W2 + b2)
with tf.name_scope('hidden2'):
weights = tf.Variable(
tf.truncated_normal([hidden1_units, hidden2_units],
stddev=1.0 / math.sqrt(float(hidden1_units))),
name='weights')
biases = tf.Variable(tf.zeros([hidden2_units]), name='biases')
hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
# Linear: logits = y2*W3 + b3
with tf.name_scope('softmax_linear'):
weights = tf.Variable(
tf.truncated_normal([hidden2_units, NUM_CLASSES],
stddev=1.0 / math.sqrt(float(hidden2_units))),
name='weights')
biases = tf.Variable(tf.zeros([NUM_CLASSES]), name='biases')
logits = tf.matmul(hidden2, weights) + biases
return logits
#根據logits和labels計算輸出層損失。
def loss(logits, labels):
labels = tf.to_int64(labels)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=labels, logits=logits, name='xentropy')
return tf.reduce_mean(cross_entropy, name='xentropy_mean')
#為損失模型新增訓練節點(需要產生和應用梯度的節點)
def training(loss, learning_rate):
# 為儲存loss的值新增一個標量彙總(scalar summary).
tf.summary.scalar('loss', loss)
# 根據給定的學習率建立梯度下降優化器
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# 建立一個變數來跟蹤global step.
global_step = tf.Variable(0, name='global_step', trainable=False)
# 在訓練節點,使用optimizer將梯度下降法應用到可調引數上來最小化損失
# (同時不斷增加 global step 計數器) .
train_op = optimizer.minimize(loss=loss,global_step=global_step)
return train_op
#評估模型輸出的logits在預測類標籤方面的質量
def evaluation(logits, labels):
correct = tf.nn.in_top_k(logits, labels, 1)
# 返回 當前批次的樣本中預測正確的樣本數量.
return tf.reduce_sum(tf.cast(correct, tf.int32))
if __name__ == '__main__':
num_steps = 5000
display_step = 200
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets( "./data/" )
logits = inference(images_placeholder,hidden1_units, hidden2_units)
batch_loss = loss(logits=logits, labels=labels_placeholder)
train_op = training(loss=batch_loss, learning_rate=learning_rate)
correct_counts = evaluation(logits=logits, labels=labels_placeholder)
##呼叫Summary.FileWriter寫入計算圖
writer = tf.summary.FileWriter("logs/mnistboard", tf.get_default_graph())
writer.close()
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
# Start training
with tf.Session() as sess:
# Run the initializer
sess.run(init)
for step in range(1, num_steps+1):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop)
sess.run(train_op, feed_dict={images_placeholder: batch_x, labels_placeholder: batch_y})
if step % display_step == 0 or step == 1:
# Calculate batch loss and accuracy
loss, acc = sess.run([batch_loss, correct_counts], feed_dict={images_placeholder: batch_x, labels_placeholder: batch_y})
print("Step " + str(step) + ", Minibatch Loss= " + \
"{:.4f}".format(loss) + ", Training Accuracy= " + \
"{:.3f}".format(float(acc)/batch_size))
print("Optimization Finished!")
# Calculate accuracy for MNIST test image
test_acc = sess.run(correct_counts, feed_dict={images_placeholder: mnist.test.images,
labels_placeholder: mnist.test.labels})
print("Testing Accuracy:{:.3f}".format(float(test_acc)/len(mnist.test.images)))
測試結果
隱藏層1節點數 | 隱藏層2節點數 | 訓練步數 | 測試集上的準確率 |
---|---|---|---|
1000 | 1000 | 5000 | 0.977 |
100 | 100 | 5000 | 0.969 |
50 | 40 | 5000 | 0.964 |
50 | 40 | 2500 | 0.956 |
20 | 15 | 2500 | 0.937 |
20 | 15 | 500 | 0.907 |
20 | 15 | 5000 | 0.954 |
可以看到隱藏層節點不需要太多,50足矣。訓練步數足夠,就能達到很不錯的效果。
另一種實現雙隱層softmax迴歸的方法
在github上有個很出名的tensorflow例程專案TensorFlow-Examples,裡面也有一個實現實現雙隱層softmax迴歸的方法。這種方法相當簡潔。
主要思想與上面的一致,就不詳細說了。原版的程式碼因為隱層節點太多,batchsize大,訓練步數少,最終準確率只有0.3。改過引數值以後,效果和上面的就接近了。所以說調參調參,還是有效果的。
""" Neural Network.
A 2-Hidden Layers Fully Connected Neural Network (a.k.a Multilayer Perceptron)
implementation with TensorFlow. This example is using the MNIST database
of handwritten digits (http://yann.lecun.com/exdb/mnist/).
Links:
[MNIST Dataset](http://yann.lecun.com/exdb/mnist/).
Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/
"""
from __future__ import print_function
# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("./data/", one_hot=True)
import tensorflow as tf
# Parameters
learning_rate = 0.1
num_steps = 2500
batch_size = 50
display_step = 100
# Network Parameters
n_hidden_1 = 50 # 1st layer number of neurons
n_hidden_2 = 40 # 2nd layer number of neurons
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
X = tf.placeholder("float", [None, num_input])
Y = tf.placeholder("float", [None, num_classes])
# Store layers weight & bias
weights = {
'h1': tf.Variable(tf.random_normal([num_input, n_hidden_1])),
'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
'out': tf.Variable(tf.random_normal([n_hidden_2, num_classes]))
}
biases = {
'b1': tf.Variable(tf.random_normal([n_hidden_1])),
'b2': tf.Variable(tf.random_normal([n_hidden_2])),
'out': tf.Variable(tf.random_normal([num_classes]))
}
# Create model
def neural_net(x):
# Hidden fully connected layer with 256 neurons
#layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
# Hidden fully connected layer with 256 neurons
#layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
layer_1 = tf.nn.relu(tf.add(tf.matmul(x, weights['h1']), biases['b1']))
layer_2 = tf.nn.relu(tf.add(tf.matmul(layer_1, weights['h2']), biases['b2']))
# Output fully connected layer with a neuron for each class
out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
return out_layer
# Construct model
logits = neural_net(X)
prediction = tf.nn.softmax(logits)
# Define loss and optimizer
loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=logits, labels=Y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
train_op = optimizer.minimize(loss_op)
# Evaluate model
correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()
# Start training
with tf.Session() as sess:
# Run the initializer
sess.run(init)
for step in range(1, num_steps+1):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop)
sess.run(train_op, feed_dict={X: batch_x, Y: batch_y})
if step % display_step == 0 or step == 1:
# Calculate batch loss and accuracy
loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x,
Y: batch_y})
print("Step " + str(step) + ", Minibatch Loss= " + \
"{:.4f}".format(loss) + ", Training Accuracy= " + \
"{:.3f}".format(acc/batch_size))
print("Optimization Finished!")
# Calculate accuracy for MNIST test images
print("Testing Accuracy:", \
sess.run(accuracy, feed_dict={X: mnist.test.images,
Y: mnist.test.labels}))