tensorflow實戰001之mnist

阿新 • • 發佈：2019-01-23

tensorflow是谷歌大腦開源的一個深度學習框架，本文我們將詳細介紹如何使用tensorflow訓練一個深度學習模型。
mnist是一個手寫數字(0~9)的資料集，提供了6W張訓練圖片和1W張測試圖片，每一張圖片都擁有相應的標籤。mnist的每一張圖片是28*28的灰度影象，mnist官網。

實驗環境

python 3.5
tensorflow 1.4.0
system: Deepin
1060 N卡(如果沒有，不影響實驗)

檔案結構

mnist
- data —-> 儲存mnist資料集
- download_mnist.py —-> 用於下載mnist資料集，如果資料集已經存在，不會重複下載
- model.py —-> mnist訓練模型
- train_mnist.py —-> 完成模型的訓練和測試

資料集下載

你可以從mnist的官網下載，分別下載下面4個檔案儲存到data目錄下：

train-images-idx3-ubyte.gz: 訓練圖片
train-labels-idx1-ubyte.gz: 訓練標籤
t10k-images-idx3-ubyte.gz: 測試圖片
t10k-labels-idx1-ubyte.gz: 測試標籤

也可以使用tensorflow提供的API進行下載。

下面詳細講解download_mnist.py檔案

#!/usr/bin/env python3 

# coding=utf-8

import os
# 匯入tensorflow提供的mnist操作的庫
from tensorflow.examples.tutorials.mnist import input_data
# 如果目錄不存在，建立一個
if not os.path.exists('data'):
    os.mkdir('data')
# 從data從讀取mnist資料集，one_hot=True表明如果檔案不存在會自動下載
mnist = input_data.read_data_sets('data/', one_hot=True)

# 這是測試mnist是否讀取成功的程式碼 

if __name__ == '__main__':
    print("訓練集圖片尺寸:", mnist.train.images.shape)
    print("訓練集標籤尺寸:", mnist.train.labels.shape)
    print("驗證集圖片尺寸:", mnist.validation.images.shape)
    print("驗證集標籤尺寸:", mnist.validation.labels.shape)
    print("測試集圖片尺寸:", mnist.test.images.shape)
    print("測試集標籤尺寸:", mnist.test.labels.shape)
    print("輸出第一個驗證集標籤資料:", mnist.train.labels[0, :])

輸出結果如下：

Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
訓練集圖片尺寸: (55000, 784)
訓練集標籤尺寸: (55000, 10)
驗證集圖片尺寸: (5000, 784)
驗證集標籤尺寸: (5000, 10)
測試集圖片尺寸: (10000, 784)
測試集標籤尺寸: (10000, 10)
輸出第一個驗證集標籤資料: [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]

這裡有一點需要提一下，上面雖然我們下載的檔案只是包含了訓練集和測試集，但是tensorflow把測試集有進一步拆分，分出來了5000張作為驗證集。還有一點就是，mnist標籤的表示並不是用(0~9)的數字表示，而是使用一組1*10的向量表示：3的表示方式是下標第三個的數字為1(下標從0開始)，其他為0，這種標記方式稱之為onehot(你是我的唯一)。

mnist模型訓練

本節中會提及很多深度學習方面的術語，如果有不明白的地方，後面我有可能會專門寫一些文章講解，但是目前請自行百度。
由於mnist一般術語深度學習方面的Hello World，因此文中我選擇使用一個很簡單的4層模型(方便訓練和理解，而且效果也不會差)：2個卷積層+2個全連線層(最後一個帶dropout)。
上面我們提到，mnist的圖片的尺寸28*28*1=784，並且每一張圖片都會帶有一個標籤資料，而且由於深度學習一般會將資料集分批次輸入模型進行訓練(這樣做的好處是：1、減小記憶體的壓力，2、分批次訓練可以快速修正深度學習的引數，因為每訓練完一個批次，就可以修正一次引數，3、適當增加批的大小可以加快收斂)，我們稱之為batch_size，因此輸入模型的資料尺寸為[batch_size, 784]。

由於第一層卷積層需要針對影象的元素進行卷積，所以需要將輸入的尺寸[batch_size, 784]變換為[batch_size, 28， 28， 1]。
而在全連線層中，我們需要將元素對映單個特徵，然後根據神經元的連線權重，將卷積出來的特徵對應到0~9之間的數字，所以需要將3維的卷積特徵(加上batch_size是4維)轉換為1位的特徵(加上batch_size是2維)。
下面上程式碼詳細解釋整個模型model.py:

#!/usr/bin/env python3
# coding=utf-8

import tensorflow as tf


def _convolution_layer(layer_name, input, neuron_num):
    input_shape = input.get_shape().as_list()   ## 獲取輸入影象的尺寸

    with tf.variable_scope(layer_name) as _:
        with tf.variable_scope('conv') as scope:
            ## 卷積層的weight，卷積核大小為5*5，均值為0.1
            weight = tf.get_variable(name='weight', shape=[5, 5, input_shape[-1], neuron_num],
                                     initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32),
                                     dtype=tf.float32)
            biases = tf.get_variable(name='biases', shape=[neuron_num],
                                     initializer=tf.constant_initializer(0.1),
                                     dtype=tf.float32)

            conv = tf.nn.conv2d(input, weight, [1, 1, 1, 1], padding='SAME')
            conv = tf.nn.relu(tf.nn.bias_add(conv, biases), name=scope.name)
        with tf.variable_scope('pool') as scope:
            ## 使用max_pool的方式計算池化
            out = tf.nn.max_pool(conv, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1],
                                 padding='SAME', name=scope.name)
    return out


def _fc_layer(layer_name, input, neuron_num, keep_prob):
    input_shape = input.get_shape().as_list()

    with tf.variable_scope(layer_name) as scope:
        weight = tf.get_variable(name='weight', shape=[input_shape[-1], neuron_num],
                                 initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32),
                                 dtype=tf.float32)
        biases = tf.get_variable(name='biases', shape=[neuron_num],
                                 initializer=tf.constant_initializer(0.01),
                                 dtype=tf.float32)

        ## 加上dropout防止過擬合
        out = tf.nn.dropout(tf.nn.relu(tf.nn.bias_add(tf.matmul(input, weight), biases)),
                            keep_prob=keep_prob,
                            name=scope.name)

        tf.summary.histogram(scope.name, out)
    return out


def _mnist_model1(input, image_shape, keep_prob):
    ## 2層卷積+2層全連線(第一層帶dropout),後面需要自行新增softmax

    input = tf.reshape(input, shape=[-1, image_shape[0], image_shape[1], image_shape[2]])

    ## 第一層卷積
    convolution_layer1_out = _convolution_layer(layer_name='convolution_layer01',
                                                input=input, neuron_num=32)
    ## 第二層卷積
    convolution_layer2_out = _convolution_layer(layer_name='convolution_layer02',
                                                input=convolution_layer1_out,
                                                neuron_num=64)

    fc1_input_shape = convolution_layer2_out.get_shape().as_list()
    fc1_input = tf.reshape(convolution_layer2_out,
                           shape=[-1, fc1_input_shape[1] * fc1_input_shape[2] * fc1_input_shape[3]])
    ## 全連線層1
    fc_layer1_out = _fc_layer(layer_name='fc_layer01',
                              input=fc1_input,
                              neuron_num=512,
                              keep_prob=keep_prob)
    ## 全連線層2
    fc_layer2_out = _fc_layer(layer_name='fc_layer02',
                              input=fc_layer1_out,
                              neuron_num=10,
                              keep_prob=1.0)
    out = fc_layer2_out
    return out


def mnist_optimizer(logit, y_):
    ## 使用交叉熵損失函式
    cross_entropy = tf.reduce_mean(
        tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logit))
    ## 使用梯度下降優化器
    return tf.train.GradientDescentOptimizer(0.003).minimize(cross_entropy)


mnist_model = _mnist_model1

該模型使用兩個卷積層提取圖片特徵，然後將特徵送入第一個全連線層，該全連線層使用0.5的dropout防止過擬合，輸出512個特徵，然後送入第二個全連線層，最終輸出10個類別概率。
這裡有必要簡單介紹下loss的計算方式，我們使用上面的模型計算輸入圖片後，會輸出一個1*10的輸出結果(可能很多是錯誤的)，我們計算模型給出的結果和實際標籤之間的距離，這個距離越小，說明模型計算的結果越正確，因此優化器的作用就是用來減少這個距離。
而梯度下降演算法的原理，想象我們在一座山上，那麼下山最快的方式就是沿著坡度最大的方向的反方向，而梯度下降演算法的原理就是，計算一個數據點的梯度，沿著梯度最小的方向調整引數。
在完成模型設計以後，需要設定好模型的輸入以及讓tensorflow開始計算我們的模型train_mnist.py：

#!/usr/bin/env python3
# coding=utf-8

from download_mnist import mnist
import model
import tensorflow as tf

# 使用模型輸出和標籤計算準確度
def accuracy(y, y_):
    return tf.reduce_mean(tf.cast(
        tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)),
        tf.float32))

if __name__ == '__main__':
    # 佔位符
    images = tf.placeholder(tf.float32, shape=[None, 784])
    labels = tf.placeholder(tf.float32, shape=[None, 10])
    keep_prob = tf.placeholder(tf.float32)
    # 獲取定義好的模型和優化器
    logit = model.mnist_model(input=images, image_shape=[28, 28, 1], keep_prob = keep_prob)
    optimizer = model.mnist_optimizer(logit, labels)
    # 初始化變數
    init = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())

    with tf.Session() as sess:
        sess.run(init)
        # 計算50000個批次，每個批次大小為64
        for step in range(50000):
            batch = mnist.train.next_batch(64)
            # 每計算1000個批次，輸出一下在驗證集上的準確率
            if step % 1000 ==0:
                val = mnist.validation.next_batch(64)
                train_accuracy = accuracy(logit, val[1]).eval(feed_dict={
                    images:val[0],
                    labels:val[1],
                    keep_prob:1.0
                })
                print('當前步數：%05d，驗證集上準確率：%.05f%%'%(step, train_accuracy*100))
            # 這裡我們需要執行優化器，因為優化器才是用來修改整個模型學習引數的
            sess.run(optimizer, feed_dict={
                images:batch[0],
                labels:batch[1],
                keep_prob:0.5
            })
        # 最後輸出在測試集上的準確率
        print('測試集上準確率：%.06f%%'%(accuracy(logit, mnist.test.labels).eval(
            feed_dict={
                images: mnist.test.images,
                labels: mnist.test.labels,
                keep_prob: 1.0
            }))*100)

經過50000次的迭代，我們的模型可以在測試集上達到97%以上的準確率。

當前步數：00000，驗證集上準確率：12.50000%
當前步數：01000，驗證集上準確率：78.12500%
當前步數：02000，驗證集上準確率：92.18750%
當前步數：03000，驗證集上準確率：90.62500%
當前步數：04000，驗證集上準確率：92.18750%
當前步數：05000，驗證集上準確率：95.31250%
當前步數：06000，驗證集上準確率：95.31250%
當前步數：07000，驗證集上準確率：96.87500%
當前步數：08000，驗證集上準確率：95.31250%
當前步數：09000，驗證集上準確率：96.87500%
當前步數：10000，驗證集上準確率：100.00000%
當前步數：11000，驗證集上準確率：96.87500%
當前步數：12000，驗證集上準確率：95.31250%
當前步數：13000，驗證集上準確率：96.87500%
當前步數：14000，驗證集上準確率：95.31250%
測試集上準確率：97.899997%

下一篇我會講解如何使用tensorflow訓練cifar10資料集，並使用訓練好的模型去識別單張圖片，see you next time。

tensorflow實戰001之mnist

實驗環境

檔案結構

資料集下載

mnist模型訓練

tensorflow實戰001之mnist

《Tensorflow實戰》之6.3VGGnet學習

機器學習筆記（十一）： TensorFlow實戰三（MNIST數字識別問題）

《tensorflow實戰》之實現AlexNet網路（六）

【TensorFlow實戰】3.MNIST數字識別（1）

Tensorflow之MNIST解析

TensorFlow實戰之Softmax Regression識別手寫數字

TensorFlow實戰之實現AlexNet經典卷積神經網絡

【TensorFlow實戰】TensorFlow實現經典卷積神經網絡之VGGNet

【TensorFlow實戰】TensorFlow實現經典卷積神經網絡之ResNet

TensorFlow之MNIST 分類以及Dropout的使用

Tensorflow編程基礎之Mnist手寫識別實驗+關於cross_entropy的理解

Tensorflow之MNIST手寫數字識別：分類問題（1）

Tensorflow之MNIST手寫數字識別：分類問題（2）

程世東老師TensorFlow實戰——個性化推薦，程式碼學習筆記之資料匯入&資料預處理（上）

程世東老師TensorFlow實戰——個性化推薦，程式碼學習筆記之資料匯入&資料預處理（下）

TensorFlow+實戰Google深度學習框架學習筆記（12）------Mnist識別和卷積神經網路LeNet

TensorFlow+實戰Google深度學習框架學習筆記（13）------Mnist識別和卷積神經網路AlexNet

TensorFlow+實戰Google深度學習框架學習筆記（13）------Mnist識別和卷積神經網絡AlexNet

程世東老師TensorFlow實戰——個性化推薦，程式碼學習筆記之③推薦過程

tensorflow實戰001之mnist

實驗環境

檔案結構

資料集下載

mnist模型訓練

相關推薦