Tensorflow手寫數字識別之簡單神經網路分類與CNN分類效果對比
用Tensorflow進行深度學習和人工智慧具有開發簡單,建模速度快,準確度高的優點。作為學習影象識別分類的入門,手寫輸入數字識別是個很好的例子。
MNIST包中共有60000個手寫數字筆跡灰度影象作為訓練集,每張手寫數字筆跡圖片均已儲存為28*28畫素,同時還有一個label集對這60000個訓練影象一一標識。此外,還有一個測試集,包括10000張新的手寫筆記灰度影象,以及一個對應10000張圖片的標記。通過使用60000張訓練集圖片及label集分別建立簡單的MNIST模型和CNN卷積神經網路模型,而後使用10000張測試圖片及對應的label集對比不同模型效果。
A. 建立簡單神經網路模型步驟如下:
1. 鑑於每張圖片解析度為28*28畫素,即28行28列個數據,對於簡單MNIST模型,這樣的資料結構還過於複雜,若將影象中所有畫素的二維關係轉化為一維關係,模型建立和訓練將會很簡單。為將該圖片中的所有畫素序列化,即將該圖片格式變為一行784列(1*784的結構)。對於模型的輸出,可使用一個一行十列的結構,表示該模型分析手寫圖片後對應數字0~9的概率,概率最大者為1,其餘9個為0。假設輸入影象為n,則輸入資料集可表示為一個二維張量[n, 784],對於輸出,使用[n, 10]的二維張量。程式中使用佔位符placeholder表示,張數引數n使用None佔位,由具體輸入的影象張數初始化。
#define place holder for inputs to network
xs =tf.placeholder(tf.float32, [None,784])#28*28
ys =tf.placeholder(tf.float32, [None,10])
2. 新增中間層網路。可使用Y =XW + b的定義中間層模型,X表示輸入的資料集(為[n,784]的二維張量); W為weight權重張量,為[784, 10]的張量,XW做矩陣乘法後得到[n, 10]的張量; b為bias量,維度為[1,10]; Y為預測結果張量,該結果張量還需要使用激勵函式處理,以拉開預測各數字概率,提高預測正確性,本程式中使用tf.nn.
def add_layer(inputs,in_size, out_size, activation_function=None):
#add one morelayer and return the output of this layer
W = tf.Variable(tf.random_normal([in_size,out_size]))
b = tf.Variable(tf.zeros([1,out_size])+0.1)
Wb = tf.matmul(inputs, W)+b
if activation_functionis None
outputs = Wb
else:
outputs = activation_function(Wb)
return outputs
3. 建立並定義網路。首先定義prediction張量,其值為新增中間層網路的返回張量。之後計算交叉熵cross_entropy,並使用梯度下降優化器GradientDescentOptimizer對交叉熵處理並訓練得到張量train_step。
#add output layer
prediction= add_layer(xs, 784, 10,activation_function= tf.nn.softmax)
#the error between prediction and real data
cross_entropy= tf.reduce_mean(-tf.reduce_sum(ys* tf.log(prediction),reduction_indices=[1]))#loss
train_step= tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
4. 訓練網路,首先要對所有變數初始化,之後,每次從訓練集中隨機去除100個樣本訓練網路,總共訓練1001次得到訓練模型
with tf.Session()assess:
if int((tf.__version__).split('.')[1]) <12andint((tf.__version__).split('.')[0])<1:
init =tf.initialize_all_veriables()
else:
init =tf.global_variables_initializer()
print(tf.__version__) sess.run(init)
for i inrange(1001):
batch_xs, batch_ys =mnist.train.next_batch(100)
sess.run(train_step, feed_dict = {xs: batch_xs, ys: batch_ys})
5. 計算模型準確性,演算法如下,v_xs為輸入的測試影象集,v_ys為輸入測試影象對應的label集。依據輸入v_xs計算出的預測結果集為y_pre將與v_ys這個label集進行對比,如果相同則判斷正確,否則為錯誤,計算出的正確結果儲存在correct_prediction 中。之後將correct_prediction張量轉換為float32格式,並求均值得到正確率。
def compute_accuracy(v_xs,v_ys):
global prediction
y_pre = sess.run(prediction, feed_dict= {xs:v_xs})
correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))
accuracy =tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
result = sess.run(accuracy, feed_dict= {xs: v_xs, ys:v_ys})
return result
B. 建立CNN模型步驟如下:
1. 對於CNN網路,無需將影象轉換為一維張量,保持其28*28*1(1為影象的channel數,灰度影象為1,彩色影象為3)的樣式進行卷積,卷積後,影象將被變為28*28*32的張量。
2. 定義卷積核。卷積核為[5,5,1,32]的思維張量,該卷積核為5*5的大小,輸入size為1,輸出size為32
def kernel_variable(shape):
initial = tf.truncated_normal(shape=shape,stddev=0.1)
return tf.Variable(initial)
w_conv1 = kernel_variable([5,5,1,32])
3. 定義bias偏量,其輸出size為32
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
b_conv1 = bias_variable([32])
4. 構建兩層卷積層,每層卷積的輸出層均被relu激勵函式處理,而後池化,作為下一層網路的輸入。第一層卷積層處理後將n*28*28*1的影象集轉換為n*28*28*32的維度,經歷池化後變為n*14*14*32。第二層卷積層將第一層卷積層的輸出由n*14*14*32變為n*14*14*64,經歷池化後變為n*7*7*64維度。
# conv1 layer
w_conv1= kernel_variable([5,5,1,32]) #kernel 5*5, insize 1, out size 32
b_conv1= bias_variable([32])
h_conv1= tf.nn.relu(conv2d(x_image, w_conv1)+b_conv1) #output size 28*28*32
h_pool1= max_pool_2x2(h_conv1) #output size 14*14*32
# conv2 layer
w_conv2= kernel_variable([5,5,32,64]) #kernel 5*5, insize 32, out size 64
b_conv2= bias_variable([64])
h_conv2= tf.nn.relu(conv2d(h_pool1, w_conv2)+ b_conv2) #outputsize 14*14*64
h_pool2= max_pool_2x2(h_conv2) #output size 7*7*64
5. 建立兩層神經網路預測結果。第一層神經網路現將第二次池化後的n*7*7*64的四維張量輸入影象轉換為n*3136的二維張量,3136是將7*7*64三維的資料轉換為一維,之後該n*3136的張量與weight權重矩陣([3136,1024] 的張量)相乘得到n*1024的二維張量輸出給第二層網路層。為了應對過擬合,使用dropout以0.5的概率故意丟棄部分網路節點以提高網路適應性。第二層網路層權重矩陣為1024*10,與第一次輸出結果相乘後得到n*10的結果集合。對於一對一的輸出結果,可採用sigmod處理,對於一對多的輸出,如本例,採用softmax。
# fc1 layer
w_fc1= kernel_variable([7*7*64,1024])
b_fc1= bias_variable([1024])
h_pool2_flat= tf.reshape(h_pool2, [-1,7*7*64])
h_fc1= tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1)+b_fc1)
h_fc1_drop= tf.nn.dropout(h_fc1, keep_prob)
# fc2 layer
w_fc2= kernel_variable([1024,10])
b_fc2= bias_variable([10])
prediction_CNN= tf.nn.softmax(tf.matmul(h_fc1_drop,w_fc2)+b_fc2)
6.訓練CNN網路。首先初始化所有變數。而後從訓練集中每次取出100張圖片和label訓練網路,共訓練1000次。
cross_entropy_CNN = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction_CNN),reduction_indices=[1]))#loss
train_step_CNN = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy_CNN)
with tf.Session()assess:
if int((tf.__version__).split('.')[1]) <12andint((tf.__version__).split('.')[0])<1:
init =tf.initialize_all_veriables()
else:
init =tf.global_variables_initializer()
print(tf.__version__)
sess. run(init)
for i in range(1001):
batch_xs, batch_ys =mnist.train.next_batch(100)
sess.run(train_step_CNN, feed_dict={xs: batch_xs,ys: batch_ys, keep_prob:0.5})
7.計算模型準確性,演算法如下,v_xs為輸入的測試影象集,v_ys為輸入測試影象對應的label集。依據輸入v_xs計算出的預測結果集為y_pre將與v_ys這個label集進行對比,如果相同則判斷正確,否則為錯誤,計算出的正確結果儲存在correct_prediction 中。之後將correct_prediction張量轉換為float32格式,並求均值得到正確率。
def compute_accuracy(v_xs, v_ys):
global prediction_CNN
y_pre = sess.run(prediction_CNN,feed_dict= {xs:v_xs})
correct_prediction =tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))
accuracy =tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
result = sess.run(accuracy, feed_dict= {xs: v_xs,ys:v_ys})
return result
8.每訓練100次,使用測試集對網路當前訓練結果進行檢測,列印預測正確率。
for i inrange(1001):
batch_xs, batch_ys =mnist.train.next_batch(100)
sess.run(train_step, feed_dict = {xs: batch_xs,ys: batch_ys})
sess.run(train_step_CNN, feed_dict={xs: batch_xs,ys: batch_ys, keep_prob:0.5})
if i%100==0:
print('correctness: ', i,' is ',compute_accuracy(mnist.test.images, mnist.test.labels))
print('correctness_CNN: ', i,' is ',compute_accuracy_CNN(mnist.test.images, mnist.test.labels))
C. 結果對比如下:如下圖可見,CNN網路準確性隨著訓練次數增加而提升,最後能打奧0.9683的準確度(完全正確為1),而簡單MNIST在訓練到800次時出現過擬合,準確率從最高的0.8692降到了0.098。我的電腦比較老,i5 (2410M)的CPU,8G記憶體,訓練大約需要15分鐘,對CPU使用率要求較高,記憶體在CNN網路訓練時佔用量較大。
途中紅線為普通神經網路結果,藍線為CNN網路結果,由左圖可見,兩種方法的loss都在隨著訓練次數的增加而降低,但是CNN能夠更接近0,表現更出眾,而預測精度也是類似,普通網路能達到約87%的正確率,但CNN網路可以達到97%,精度提升顯著。每輪的計算結果如下:
correctness: 0 is 0.147100001574
correctness_CNN: 0 is 0.12120000273
loss: 0 is 9.97904
loss_CNN: 0 is 5.7561
correctness: 100 is 0.73710000515
correctness_CNN: 100 is 0.888899981976
loss: 100 is 1.38197
loss_CNN: 100 is 0.353873
correctness: 200 is 0.805999994278
correctness_CNN: 200 is 0.930100023746
loss: 200 is 0.997057
loss_CNN: 200 is 0.235152
correctness: 300 is 0.825699985027
correctness_CNN: 300 is 0.940500020981
loss: 300 is 0.866042
loss_CNN: 300 is 0.196917
correctness: 400 is 0.847999989986
correctness_CNN: 400 is 0.951200008392
loss: 400 is 0.753898
loss_CNN: 400 is 0.165623
correctness: 500 is 0.853100001812
correctness_CNN: 500 is 0.954999983311
loss: 500 is 0.697782
loss_CNN: 500 is 0.147157
correctness: 600 is 0.860800027847
correctness_CNN: 600 is 0.960699975491
loss: 600 is 0.666501
loss_CNN: 600 is 0.137592
correctness: 700 is 0.866400003433
correctness_CNN: 700 is 0.963800013065
loss: 700 is 0.618222
loss_CNN: 700 is 0.119138
correctness: 800 is 0.868799984455
correctness_CNN: 800 is 0.967599987984
loss: 800 is 0.59465
loss_CNN: 800 is 0.108558
correctness: 900 is 0.875800013542
correctness_CNN: 900 is 0.969799995422
loss: 900 is 0.567654
loss_CNN: 900 is 0.101511
correctness: 1000 is 0.87349998951
correctness_CNN: 1000 is 0.971400022507
loss: 1000 is 0.564226
loss_CNN: 1000 is 0.0913478
D. 完整程式碼如下:
from __future__ import print_function
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import os
import numpy as np
import matplotlib.pyplot as plt
MODEL_SAVE_PATH="my_net/"
MODEL_NAME="save_net.ckpt"
#number 1 to 10 data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
def add_layer(inputs, in_size, out_size, activation_function=None):
#add one more layer and return the output of this layer
W = tf.Variable(tf.random_normal([in_size, out_size]))
b = tf.Variable(tf.zeros([1,out_size])+0.1)
Wb = tf.matmul(inputs, W)+b
if activation_function is None:
outputs = Wb
else:
outputs = activation_function(Wb)
return outputs
def compute_accuracy(v_xs, v_ys):
global prediction
y_pre = sess.run(prediction, feed_dict = {xs:v_xs})
correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
result = sess.run(accuracy, feed_dict = {xs: v_xs, ys:v_ys})
return result
def compute_accuracy_CNN(v_xs, v_ys):
global prediction_CNN
y_pre = sess.run(prediction_CNN, feed_dict = {xs:v_xs, keep_prob:1})
correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
result = sess.run(accuracy, feed_dict = {xs: v_xs, ys:v_ys, keep_prob:1})
return result
def kernel_variable(shape):
initial = tf.truncated_normal(shape=shape, stddev = 0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x,W):
#stride [1, x_movement, y_movement,1]
#stride[0] and stride[3] must be 1
return tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = 'SAME')
def max_pool_2x2(x):
# stride [1, x_movement, y_movement,1]
return tf.nn.max_pool(x, ksize= [1,2,2,1], strides=[1,2,2,1], padding='SAME')
#define place holder for inputs to network
xs = tf.placeholder(tf.float32, [None, 784]) #28*28
ys = tf.placeholder(tf.float32, [None, 10])
keep_prob = tf.placeholder(tf.float32)
x_image = tf.reshape(xs, [-1,28,28,1])
# conv1 layer
w_conv1 = kernel_variable([5,5,1,32]) #kernel 5*5, in size 1, out size 32
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1)+b_conv1) #output size 28*28*32
h_pool1 = max_pool_2x2(h_conv1) #output size 14*14*32
# conv2 layer
w_conv2 = kernel_variable([5,5,32,64]) #kernel 5*5, in size 32, out size 64
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2)+ b_conv2) #output size 14*14*64
h_pool2 = max_pool_2x2(h_conv2) #output size 7*7*64
# fc1 layer
w_fc1 = kernel_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1,7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1)+b_fc1)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
# fc2 layer
w_fc2 = kernel_variable([1024,10])
b_fc2 = bias_variable([10])
prediction_CNN = tf.nn.softmax(tf.matmul(h_fc1_drop,w_fc2)+b_fc2)
#add output layer
prediction = add_layer(xs, 784, 10, activation_function= tf.nn.softmax)
#the error between prediction and real data
cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys* tf.log(prediction), reduction_indices=[1])) #loss
cross_entropy_CNN = tf.reduce_mean(-tf.reduce_sum(ys* tf.log(prediction_CNN), reduction_indices=[1])) #loss
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
train_step_CNN = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy_CNN)
saver = tf.train.Saver() # define a saver for saving and restoring
Total_test_loss = np.zeros((int(1001/100)+1), float)
Total_test_loss_CNN = np.zeros((int(1001/100)+1), float)
Total_test_acc = np.zeros((int(1001/100)+1), float)
Total_test_acc_CNN = np.zeros((int(1001/100)+1), float)
count =0
with tf.Session() as sess:
if int((tf.__version__).split('.')[1]) <12 and int((tf.__version__).split('.')[0])<1:
init = tf.initialize_all_veriables()
else:
init = tf.global_variables_initializer()
print(tf.__version__)
sess. run(init)
for i in range(1001):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict = {xs: batch_xs, ys: batch_ys})
sess.run(train_step_CNN, feed_dict={xs: batch_xs, ys: batch_ys, keep_prob: 0.5})
if i%100 ==0:
Total_test_acc[count] = compute_accuracy(mnist.test.images, mnist.test.labels)
Total_test_acc_CNN[count] = compute_accuracy_CNN(mnist.test.images, mnist.test.labels)
print('correctness: ', i, ' \tis \t', Total_test_acc[count])
print('correctness_CNN: ', i, ' \tis \t', Total_test_acc_CNN[count])
loss = sess.run(cross_entropy, feed_dict={xs: mnist.test.images, ys: mnist.test.labels, keep_prob: 1.0})
loss_CNN = sess.run(cross_entropy_CNN,
feed_dict={xs: mnist.test.images, ys: mnist.test.labels, keep_prob: 1.0})
print('loss: ', i, ' \tis \t', loss)
print('loss_CNN: ', i, ' \tis \t', loss_CNN)
Total_test_loss[count] = loss
Total_test_loss_CNN[count] = loss_CNN
count += 1
saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), write_meta_graph=False)
# plotting
plt.figure(1, figsize=(15, 5))
plt.subplot(121)
# plt.scatter(x, y)
plt.ylabel('Compare Losses')
plt.plot(Total_test_loss, 'r-', lw=5)
plt.plot(Total_test_loss_CNN, 'b-', lw=5)
plt.text(-1, -1, 'Loss Chart')
plt.subplot(122)
# plt.scatter(x, y)
plt.ylabel('Compare Accuracy:')
plt.plot(Total_test_acc, 'r-', lw=5)
plt.plot(Total_test_acc_CNN, 'b-', lw=5)
plt.text(-1, -1, 'Accuracy Chart')
plt.show()