Tensorflow手寫數字識別之簡單神經網路分類與CNN分類效果對比

阿新 • • 發佈：2019-01-08

用Tensorflow進行深度學習和人工智慧具有開發簡單，建模速度快，準確度高的優點。作為學習影象識別分類的入門，手寫輸入數字識別是個很好的例子。

MNIST包中共有60000個手寫數字筆跡灰度影象作為訓練集，每張手寫數字筆跡圖片均已儲存為28*28畫素，同時還有一個label集對這60000個訓練影象一一標識。此外，還有一個測試集，包括10000張新的手寫筆記灰度影象，以及一個對應10000張圖片的標記。通過使用60000張訓練集圖片及label集分別建立簡單的MNIST模型和CNN卷積神經網路模型，而後使用10000張測試圖片及對應的label集對比不同模型效果。

A. 建立簡單神經網路模型步驟如下：

1. 鑑於每張圖片解析度為28*28畫素，即28行28列個數據，對於簡單MNIST模型，這樣的資料結構還過於複雜，若將影象中所有畫素的二維關係轉化為一維關係，模型建立和訓練將會很簡單。為將該圖片中的所有畫素序列化，即將該圖片格式變為一行784列（1*784的結構）。對於模型的輸出，可使用一個一行十列的結構，表示該模型分析手寫圖片後對應數字0~9的概率，概率最大者為1，其餘9個為0。假設輸入影象為n，則輸入資料集可表示為一個二維張量[n, 784]，對於輸出，使用[n, 10]的二維張量。程式中使用佔位符placeholder表示，張數引數n使用None佔位，由具體輸入的影象張數初始化。

#define place holder for inputs to network

xs =tf.placeholder(tf.float32, [None,784])#28*28

ys =tf.placeholder(tf.float32, [None,10])

2. 新增中間層網路。可使用Y =XW + b的定義中間層模型，X表示輸入的資料集（為[n,784]的二維張量）; W為weight權重張量，為[784, 10]的張量，XW做矩陣乘法後得到[n, 10]的張量; b為bias量，維度為[1，10]; Y為預測結果張量，該結果張量還需要使用激勵函式處理，以拉開預測各數字概率，提高預測正確性，本程式中使用tf.nn.

softmax方法，專門針對n選一的用例。

def add_layer(inputs,in_size, out_size, activation_function=None):

#add one morelayer and return the output of this layer

W = tf.Variable(tf.random_normal([in_size,out_size]))

b = tf.Variable(tf.zeros([1,out_size])+0.1)

Wb = tf.matmul(inputs, W)+b

if activation_functionis None

outputs = Wb

else:

outputs = activation_function(Wb)

return outputs

3. 建立並定義網路。首先定義prediction張量，其值為新增中間層網路的返回張量。之後計算交叉熵cross_entropy，並使用梯度下降優化器GradientDescentOptimizer對交叉熵處理並訓練得到張量train_step。

#add output layer

prediction= add_layer(xs, 784, 10,activation_function= tf.nn.softmax)

#the error between prediction and real data

cross_entropy= tf.reduce_mean(-tf.reduce_sum(ys* tf.log(prediction),reduction_indices=[1]))#loss

train_step= tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

4. 訓練網路，首先要對所有變數初始化，之後，每次從訓練集中隨機去除100個樣本訓練網路，總共訓練1001次得到訓練模型

with tf.Session()assess:

if int((tf.__version__).split('.')[1]) <12andint((tf.__version__).split('.')[0])<1:

init =tf.initialize_all_veriables()

else:

init =tf.global_variables_initializer()

print(tf.__version__) sess.run(init)

for i inrange(1001):

batch_xs, batch_ys =mnist.train.next_batch(100)

sess.run(train_step, feed_dict = {xs: batch_xs, ys: batch_ys})

5. 計算模型準確性，演算法如下，v_xs為輸入的測試影象集，v_ys為輸入測試影象對應的label集。依據輸入v_xs計算出的預測結果集為y_pre將與v_ys這個label集進行對比，如果相同則判斷正確，否則為錯誤，計算出的正確結果儲存在correct_prediction 中。之後將correct_prediction張量轉換為float32格式，並求均值得到正確率。

def compute_accuracy(v_xs,v_ys):

global prediction

y_pre = sess.run(prediction, feed_dict= {xs:v_xs})

correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))

accuracy =tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

result = sess.run(accuracy, feed_dict= {xs: v_xs, ys:v_ys})

return result

B. 建立CNN模型步驟如下：

1. 對於CNN網路，無需將影象轉換為一維張量，保持其28*28*1（1為影象的channel數，灰度影象為1，彩色影象為3）的樣式進行卷積，卷積後，影象將被變為28*28*32的張量。

2. 定義卷積核。卷積核為[5,5,1,32]的思維張量，該卷積核為5*5的大小，輸入size為1，輸出size為32

def kernel_variable(shape):

initial = tf.truncated_normal(shape=shape,stddev=0.1)

return tf.Variable(initial)

w_conv1 = kernel_variable([5,5,1,32])

3. 定義bias偏量，其輸出size為32

def bias_variable(shape):

initial = tf.constant(0.1, shape=shape)

return tf.Variable(initial)

b_conv1 = bias_variable([32])

4. 構建兩層卷積層，每層卷積的輸出層均被relu激勵函式處理，而後池化，作為下一層網路的輸入。第一層卷積層處理後將n*28*28*1的影象集轉換為n*28*28*32的維度，經歷池化後變為n*14*14*32。第二層卷積層將第一層卷積層的輸出由n*14*14*32變為n*14*14*64，經歷池化後變為n*7*7*64維度。

# conv1 layer

w_conv1= kernel_variable([5,5,1,32]) #kernel 5*5, insize 1, out size 32

b_conv1= bias_variable([32])

h_conv1= tf.nn.relu(conv2d(x_image, w_conv1)+b_conv1) #output size 28*28*32

h_pool1= max_pool_2x2(h_conv1) #output size 14*14*32

# conv2 layer

w_conv2= kernel_variable([5,5,32,64]) #kernel 5*5, insize 32, out size 64

b_conv2= bias_variable([64])

h_conv2= tf.nn.relu(conv2d(h_pool1, w_conv2)+ b_conv2) #outputsize 14*14*64

h_pool2= max_pool_2x2(h_conv2) #output size 7*7*64

5. 建立兩層神經網路預測結果。第一層神經網路現將第二次池化後的n*7*7*64的四維張量輸入影象轉換為n*3136的二維張量，3136是將7*7*64三維的資料轉換為一維，之後該n*3136的張量與weight權重矩陣（[3136,1024] 的張量）相乘得到n*1024的二維張量輸出給第二層網路層。為了應對過擬合，使用dropout以0.5的概率故意丟棄部分網路節點以提高網路適應性。第二層網路層權重矩陣為1024*10，與第一次輸出結果相乘後得到n*10的結果集合。對於一對一的輸出結果，可採用sigmod處理，對於一對多的輸出，如本例，採用softmax。

# fc1 layer

w_fc1= kernel_variable([7*7*64,1024])

b_fc1= bias_variable([1024])

h_pool2_flat= tf.reshape(h_pool2, [-1,7*7*64])

h_fc1= tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1)+b_fc1)

h_fc1_drop= tf.nn.dropout(h_fc1, keep_prob)

# fc2 layer

w_fc2= kernel_variable([1024,10])

b_fc2= bias_variable([10])

prediction_CNN= tf.nn.softmax(tf.matmul(h_fc1_drop,w_fc2)+b_fc2)

6.訓練CNN網路。首先初始化所有變數。而後從訓練集中每次取出100張圖片和label訓練網路，共訓練1000次。

cross_entropy_CNN = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction_CNN),reduction_indices=[1]))#loss

train_step_CNN = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy_CNN)

with tf.Session()assess:

if int((tf.__version__).split('.')[1]) <12andint((tf.__version__).split('.')[0])<1:

init =tf.initialize_all_veriables()

else:

init =tf.global_variables_initializer()

print(tf.__version__)

sess. run(init)

for i in range(1001):

batch_xs, batch_ys =mnist.train.next_batch(100)

sess.run(train_step_CNN, feed_dict={xs: batch_xs,ys: batch_ys, keep_prob:0.5})

7.計算模型準確性，演算法如下，v_xs為輸入的測試影象集，v_ys為輸入測試影象對應的label集。依據輸入v_xs計算出的預測結果集為y_pre將與v_ys這個label集進行對比，如果相同則判斷正確，否則為錯誤，計算出的正確結果儲存在correct_prediction 中。之後將correct_prediction張量轉換為float32格式，並求均值得到正確率。

def compute_accuracy(v_xs, v_ys):

global prediction_CNN

y_pre = sess.run(prediction_CNN,feed_dict= {xs:v_xs})

correct_prediction =tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))

accuracy =tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

result = sess.run(accuracy, feed_dict= {xs: v_xs,ys:v_ys})

return result

8.每訓練100次，使用測試集對網路當前訓練結果進行檢測，列印預測正確率。

for i inrange(1001):

batch_xs, batch_ys =mnist.train.next_batch(100)

sess.run(train_step, feed_dict = {xs: batch_xs,ys: batch_ys})

sess.run(train_step_CNN, feed_dict={xs: batch_xs,ys: batch_ys, keep_prob:0.5})

if i%100==0:

print('correctness: ', i,' is ',compute_accuracy(mnist.test.images, mnist.test.labels))

print('correctness_CNN: ', i,' is ',compute_accuracy_CNN(mnist.test.images, mnist.test.labels))

C. 結果對比如下：如下圖可見，CNN網路準確性隨著訓練次數增加而提升，最後能打奧0.9683的準確度（完全正確為1），而簡單MNIST在訓練到800次時出現過擬合，準確率從最高的0.8692降到了0.098。我的電腦比較老，i5 （2410M）的CPU，8G記憶體，訓練大約需要15分鐘，對CPU使用率要求較高，記憶體在CNN網路訓練時佔用量較大。

途中紅線為普通神經網路結果，藍線為CNN網路結果，由左圖可見，兩種方法的loss都在隨著訓練次數的增加而降低，但是CNN能夠更接近0，表現更出眾，而預測精度也是類似，普通網路能達到約87%的正確率，但CNN網路可以達到97%，精度提升顯著。每輪的計算結果如下：

correctness: 0 is  0.147100001574
correctness_CNN: 0 is  0.12120000273
loss: 0 is  9.97904
loss_CNN: 0 is  5.7561
correctness: 100 is  0.73710000515
correctness_CNN: 100 is  0.888899981976
loss: 100 is  1.38197
loss_CNN: 100 is  0.353873
correctness: 200 is  0.805999994278
correctness_CNN: 200 is  0.930100023746
loss: 200 is  0.997057
loss_CNN: 200 is  0.235152
correctness: 300 is  0.825699985027
correctness_CNN: 300 is  0.940500020981
loss: 300 is  0.866042
loss_CNN: 300 is  0.196917
correctness: 400 is  0.847999989986
correctness_CNN: 400 is  0.951200008392
loss: 400 is  0.753898
loss_CNN: 400 is  0.165623
correctness: 500 is  0.853100001812
correctness_CNN: 500 is  0.954999983311
loss: 500 is  0.697782
loss_CNN: 500 is  0.147157
correctness: 600 is  0.860800027847
correctness_CNN: 600 is  0.960699975491
loss: 600 is  0.666501
loss_CNN: 600 is  0.137592
correctness: 700 is  0.866400003433
correctness_CNN: 700 is  0.963800013065
loss: 700 is  0.618222
loss_CNN: 700 is  0.119138
correctness: 800 is  0.868799984455
correctness_CNN: 800 is  0.967599987984
loss: 800 is  0.59465
loss_CNN: 800 is  0.108558
correctness: 900 is  0.875800013542
correctness_CNN: 900 is  0.969799995422
loss: 900 is  0.567654
loss_CNN: 900 is  0.101511
correctness: 1000 is  0.87349998951
correctness_CNN: 1000 is  0.971400022507
loss: 1000 is  0.564226
loss_CNN: 1000 is  0.0913478

D. 完整程式碼如下：

from __future__ import print_function
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import os
import numpy as np
import matplotlib.pyplot as plt

MODEL_SAVE_PATH="my_net/"
MODEL_NAME="save_net.ckpt"

#number 1 to 10 data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

def add_layer(inputs, in_size, out_size, activation_function=None):
    #add one more layer and return the output of this layer
    W = tf.Variable(tf.random_normal([in_size, out_size]))
    b = tf.Variable(tf.zeros([1,out_size])+0.1)
    Wb = tf.matmul(inputs, W)+b
    if activation_function is None:
        outputs = Wb
    else:
        outputs = activation_function(Wb)
    return outputs

def compute_accuracy(v_xs, v_ys):
    global  prediction
    y_pre = sess.run(prediction, feed_dict = {xs:v_xs})
    correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    result = sess.run(accuracy, feed_dict = {xs: v_xs, ys:v_ys})
    return result

def compute_accuracy_CNN(v_xs, v_ys):
    global  prediction_CNN
    y_pre = sess.run(prediction_CNN, feed_dict = {xs:v_xs, keep_prob:1})
    correct_prediction = tf.equal(tf.argmax(y_pre,1), tf.argmax(v_ys,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    result = sess.run(accuracy, feed_dict = {xs: v_xs, ys:v_ys, keep_prob:1})
    return result

def kernel_variable(shape):
    initial = tf.truncated_normal(shape=shape, stddev = 0.1)
    return tf.Variable(initial)

def bias_variable(shape):
    initial = tf.constant(0.1, shape=shape)
    return tf.Variable(initial)

def conv2d(x,W):
    #stride [1, x_movement, y_movement,1]
    #stride[0] and stride[3] must be 1
    return tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = 'SAME')

def max_pool_2x2(x):
    # stride [1, x_movement, y_movement,1]
    return tf.nn.max_pool(x, ksize= [1,2,2,1], strides=[1,2,2,1], padding='SAME')

#define place holder for inputs to network
xs = tf.placeholder(tf.float32, [None, 784]) #28*28
ys = tf.placeholder(tf.float32, [None, 10])
keep_prob = tf.placeholder(tf.float32)
x_image = tf.reshape(xs, [-1,28,28,1])

# conv1 layer
w_conv1 = kernel_variable([5,5,1,32])   #kernel 5*5, in size 1, out size 32
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, w_conv1)+b_conv1)  #output size 28*28*32
h_pool1 = max_pool_2x2(h_conv1)         #output size 14*14*32

# conv2 layer
w_conv2 = kernel_variable([5,5,32,64])  #kernel 5*5, in size 32, out size 64
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, w_conv2)+ b_conv2) #output size 14*14*64
h_pool2 = max_pool_2x2(h_conv2)         #output size 7*7*64

# fc1 layer
w_fc1 = kernel_variable([7*7*64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1,7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, w_fc1)+b_fc1)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# fc2 layer
w_fc2 = kernel_variable([1024,10])
b_fc2 = bias_variable([10])
prediction_CNN = tf.nn.softmax(tf.matmul(h_fc1_drop,w_fc2)+b_fc2)


#add output layer
prediction = add_layer(xs, 784, 10, activation_function= tf.nn.softmax)

#the error between prediction and real data
cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys* tf.log(prediction), reduction_indices=[1])) #loss
cross_entropy_CNN = tf.reduce_mean(-tf.reduce_sum(ys* tf.log(prediction_CNN), reduction_indices=[1])) #loss

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
train_step_CNN = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy_CNN)
saver = tf.train.Saver()  # define a saver for saving and restoring

Total_test_loss = np.zeros((int(1001/100)+1), float)
Total_test_loss_CNN = np.zeros((int(1001/100)+1), float)
Total_test_acc = np.zeros((int(1001/100)+1), float)
Total_test_acc_CNN = np.zeros((int(1001/100)+1), float)
count =0
with tf.Session() as sess:
    if int((tf.__version__).split('.')[1]) <12 and int((tf.__version__).split('.')[0])<1:
        init = tf.initialize_all_veriables()
    else:
        init = tf.global_variables_initializer()
    print(tf.__version__)
    sess. run(init)

    for i in range(1001):
        batch_xs, batch_ys = mnist.train.next_batch(100)
        sess.run(train_step, feed_dict = {xs: batch_xs, ys: batch_ys})
        sess.run(train_step_CNN, feed_dict={xs: batch_xs, ys: batch_ys, keep_prob: 0.5})

        if i%100 ==0:
            Total_test_acc[count] = compute_accuracy(mnist.test.images, mnist.test.labels)
            Total_test_acc_CNN[count] = compute_accuracy_CNN(mnist.test.images, mnist.test.labels)
            print('correctness:         ', i, ' \tis \t', Total_test_acc[count])
            print('correctness_CNN:     ', i, ' \tis \t', Total_test_acc_CNN[count])
            loss = sess.run(cross_entropy, feed_dict={xs: mnist.test.images, ys: mnist.test.labels, keep_prob: 1.0})
            loss_CNN = sess.run(cross_entropy_CNN,
                                feed_dict={xs: mnist.test.images, ys: mnist.test.labels, keep_prob: 1.0})
            print('loss:                ', i, ' \tis \t', loss)
            print('loss_CNN:            ', i, ' \tis \t', loss_CNN)
            Total_test_loss[count] = loss
            Total_test_loss_CNN[count] = loss_CNN
            count += 1

    saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), write_meta_graph=False)
    # plotting
    plt.figure(1, figsize=(15, 5))
    plt.subplot(121)
    # plt.scatter(x, y)
    plt.ylabel('Compare Losses')
    plt.plot(Total_test_loss, 'r-', lw=5)
    plt.plot(Total_test_loss_CNN, 'b-', lw=5)
    plt.text(-1, -1, 'Loss Chart')

    plt.subplot(122)
    # plt.scatter(x, y)
    plt.ylabel('Compare Accuracy:')
    plt.plot(Total_test_acc, 'r-', lw=5)
    plt.plot(Total_test_acc_CNN, 'b-', lw=5)
    plt.text(-1, -1, 'Accuracy Chart')
    plt.show()

Tensorflow手寫數字識別之簡單神經網路分類與CNN分類效果對比

Tensorflow手寫數字識別之簡單神經網路分類與CNN分類效果對比

MNIST手寫數字識別【Matlab神經網路工具箱】

TensorFlow入門之三：tensorflow手寫數字識別進階-卷積神經網路

TensorFlow入門之二：tensorflow手寫數字識別

TensorFlow學習筆記(二)：手寫數字識別之多層感知機

【Tensorflow入門】手寫字型識別——卷積神經網路

Mnist手寫數字識別之CNN實現

Pytorch 快速入門（六）MNIST 手寫數字識別之CNN

Kaggle競賽丨入門手寫數字識別之KNN、CNN、降維

Tensorflow之MNIST手寫數字識別：分類問題（1）

Tensorflow之MNIST手寫數字識別：分類問題（2）

TensorFlow神經網路（四）手寫數字識別

新手上手Tensorflow之手寫數字識別應用（3）

新手上手Tensorflow之手寫數字識別應用（2）

新手上手Tensorflow之手寫數字識別應用（1）

【深度學習】python實現簡單神經網路以及手寫數字識別案例

TensorFlow筆記（3）——利用TensorFlow和MNIST資料集訓練一個最簡單的手寫數字識別模型

TensorFlow.js 卷積神經網路手寫數字識別

TensorFlow筆記(4)——優化手寫數字識別模型之代價函式和擬合

Tensorflow案例4：Mnist手寫數字識別(線性神經網路)及其侷限性

Tensorflow手寫數字識別之簡單神經網路分類與CNN分類效果對比

相關推薦