深度學習小白——tensorflow(四)CIFAR-10例項

阿新 • • 發佈：2019-02-09

一、資料讀取

因為之前寫過，見http://blog.csdn.net/margretwg/article/details/70168256，這裡就不重複了

二、模型建立

全域性引數

import os
import re
import sys
import tarfile
import tensorflow as tf
import CIFAR10.CIFAR_input as input
FLAGS=tf.app.flags.FLAGS

#模型引數
tf.app.flags.DEFINE_integer('batch_size', 128,
                            """Number of images to process in a batch.""")
tf.app.flags.DEFINE_string('data_dir', 'E:/Python/tensorflow/CIFAR10',
                           """Path to the CIFAR-10 data directory.""")
tf.app.flags.DEFINE_boolean('use_fp16', False,
                            """Train the model using fp16.""")

#全域性變數
IMAGE_SIZE=input.IMAGE_SIZE
NUM_CLASSES=input.NUM_CLASSES
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN=input.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL=input.NUM_EXAMPLES_PER_EPOCH_FOR_EVAL

#訓練過程中的常量
MOVING_AVERAGE_DECAY=0.9999
NUM_EPOCH_PER_DECAY=350.0 #epochs after which learning rate decays
LEARNING_RATE_DECAY_FACTOR=0.1 #學習率衰減因子
INITIAL_LEARNING_RATE=0.1

2.1 模型預測inference()

主要有：conv1-->pool1-->norm1-->conv2-->norm2-->pool2-->local3-->local4-->softmax_linear

該模組返回的是（128,10）的張量

def inference(images):
    """
    建立CIFAR-10模型
    :param images: Images來自distorted_inputs()或inputs()
    :return:
    Logits神經元
    """
    #conv1
    with tf.variable_scope('conv1')as scope:
        kernel=_variable_with_weight_decay('weights',shape=[5,5,3,64],stddev=5e-2,wd=0.0)
        conv=tf.nn.conv2d(images,kernel,[1,1,1,1],padding='SAME')#卷積操作
        biases=_variable_on_cpu('biases',[64],tf.constant_initializer(0.0))
        pre_activation=tf.nn.bias_add(conv,biases)# WX+b
        conv1=tf.nn.relu(pre_activation,name=scope.name)
        _activation_summary(conv1)

    #pool1
    pool1=tf.nn.max_pool(conv1,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool1')


    #norm1
    norm1=tf.nn.lrn(pool1,4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm1')

    #conv2
    with tf.variable_scope('conv2') as scope:
        kernel=_variable_with_weight_decay('weights',shape=[5,5,64,64],stddev=5e-2,wd=0.0)
        conv=tf.nn.conv2d(norm1,kernel,[1,1,1,1],padding='SAME')
        biases=_variable_on_cpu('biases',[64],tf.constant_initializer(0.1))
        pre_activation=tf.nn.bias_add(conv,biases)
        conv2=tf.nn.relu(pre_activation,name=scope.name)
        _activation_summary(conv2)

     #norm2
    norm2=tf.nn.lrn(conv2,4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm2')

    #pool2
    pool2=tf.nn.max_pool(norm2,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool2')

    #local3
    with tf.variable_scope('local3')as scope:
        #Move everything into depth so we can perform a single matrix multiply
        reshape=tf.reshape(pool2,[FLAGS.batch_size,-1])
        dim=reshape.get_shape()[1].value
        weights=_variable_with_weight_decay('weights',shape=[dim,384],stddev=0.04,wd=0.004)
        biases=_variable_on_cpu('biases',[384],tf.constant_initializer(0.1))
        local3=tf.nn.relu(tf.matmul(reshape,weights)+biases,name=scope.name)
        _activation_summary(local3)

     #local4
    with tf.variable_scope('local4') as scope:
        weights = _variable_with_weight_decay('weights', shape=[384, 192],
                                              stddev=0.04, wd=0.004)
        biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
        local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
        _activation_summary(local4)

    with tf.variable_scope('softmax_linear') as scope:
        weights=_variable_with_weight_decay('weights',[192,NUM_CLASSES],stddev=1/192.0,wd=0.0)
        biases=_variable_on_cpu('biases',[NUM_CLASSES],tf.constant_initializer(0.0))
        softmax_linear=tf.add(tf.matmul(local4,weights),biases,name=scope.name)
        _activation_summary(softmax_linear)

    return softmax_linear #

其中，_variable_with_weight_decay()函式用於初始化weights，並且這裡帶一個衰減係數wd，用於計算權重衰減loss，加入到collection中，方便最後計算total_loss

def _variable_with_weight_decay(name,shape,stddev,wd):
    """
    Helper to create an initialized Variable with weight decay

    這裡變數被初始化為截斷正態分佈
    :param stddev:標準差
    :param wd: add L2 loss weight decay multiplied by this float. If None, weight decay is not added for this Variable
    :return:
    Variable tensor
    """

    dtype=tf.float16 if FLAGS.use_fp16 else tf.float32
    var=_variable_on_cpu(name,shape,tf.truncated_normal_initializer(stddev=stddev,dtype=dtype))
    if wd is not None:
      weight_decay=tf.multiply(tf.nn.l2_loss(var),wd,name='weight_loss')
      tf.add_to_collection('losses',weight_decay)

    return var

_variable_on_cpu（）函式即在CPU上建立初始化了的name=name,shape=shape的變數

def _variable_on_cpu(name,shape,initializer):
    """
    Helper to create a Variable stored oon CPU memory
    :param name: 變數名
    :param shape: lists of ints
    :param initializer: 初始化變數值
    :return:
    Variable Tensor
    """
    with tf.device('/cpu:0'):
        dtype=tf.float16 if FLAGS.use_fp16 else tf.float32
        var=tf.get_variable(name,shape,initializer=initializer,dtype=dtype)
        return var

[補1——collection]:

tensorflow 的collection提供了一個全域性的儲存機制，不會受到變數名生存空間的影響，一處儲存，到處可取

（1）tf.Graph.add_to_collection(name,value) 向collection中存資料

collection不是set，所以一個'name'下可以存很多值, tf.add_to_collection(name,value)是給預設圖使用的

（2）tf.Graph.get_collection(name,scope=None)

返回名字為name的list of values in the collection，scope不為None的時候，the resulting list is filtered to include only items whose name attribute matches using re.math,items without a name attribute are never returned.因此此例沒有用這個引數，所以我具體也不太清楚這個scope是幹嘛的··以後碰到再補充吧

2.2 算loss

對所有學習變數應用權重衰減損失。模型的目標函式是求交叉熵損失和所有權重衰減項的和

def loss(logits,labels):
    """
    Add L2loss to all the trainable variables
    Add summary for "loss" and "loss/avg"
    :param logits: logits from inference()
    :param labels: labels from distorted_inputs or inputs() 1-D tensor of shape[batch_size]

    :return: loss tensor of type float
    """

    #計算平均交叉熵損失對一個batch
    labels=tf.cast(labels,tf.int64)
    cross_entropy=tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=logits,name="cross_entropy_per_exapmle")
    cross_entropy_mean=tf.reduce_mean(cross_entropy,name='cross_entropy')
    tf.add_to_collection('losses',cross_entropy_mean)

    #總共的損失應該是交叉熵損失加上權重衰減項（L2 LOSS）
    #權重的二範數值剛剛也加到了'losses'的collection裡，這裡的tf.add_n()就是將loss和剛剛的weights的二範數值對應相加
    return tf.add_n(tf.get_collection('losses'),name='total_loss')

【補2】tf.nn.sparse_softmax_cross_entropy_with_logits(_sentinel=None,labels=None,logits=None,name=None)
計算稀疏softmax交叉熵between labels和logits,該函式針對那種每一個樣本對應一個離散的獨立的分類任務，如CIFAR-10，也就是說soft classes 在這裡是不允許的，label 向量必須提供一個單一具體的index對於每一行（樣本）logits。對於soft softmax分類，用tf.nn.softmax_cross_entropy_with_logtis()

返回一個與‘labels’一樣大小的tensor，裡面是每個樣本的loss

==================================================================================================

【補3】

tf.add_n(inputs,name=None)

Add all input tensors element-wise

返回與inputs裡面元素大小一樣的tensor

此處將collection 裡面叫‘losses’的元素list全加起來，就是把剛算的平均loss和所有不同層的weights的二範數值加起來得到total_loss

2.3 更新引數/train_op

新增一些操作使得目標函式最小化，這些操作包括計算梯度、更新學習變數, 函式最終會返回一個用以對一批影象執行所有計算的操作步驟(train_op)，以便訓練並更新模型。

def train(total_loss,global_step):
    """
    Train CIFAR-10 model
    設立優化器，並對於所有可訓練變數新增滑動平均
    :param total_loss:Total loss from loss()
    :param global_step:integer Varibale conunting the number of trainnig steps processed
    :return: train_op:op for training
    """
    #Variables that affect learning rate
    num_batches_per_epoch=NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN/FLAGS.batch_size
    decay_steps=int(num_batches_per_epoch* NUM_EPOCH_PER_DECAY)

    #decay the learning rate exponentially based on the number of steps
    #隨著迭代過程衰減學習率
    lr=tf.train.exponential_decay(INITIAL_LEARNING_RATE,global_step,decay_steps,LEARNING_RATE_DECAY_FACTOR,staircase=True)
    tf.summary.scalar('learning_rate',lr)

    #滑動平均 of all losses and associated summaries
    loss_averages_op=_add_loss_summaries(total_loss)

    #計算梯度
    with tf.control_dependencies([loss_averages_op]):
        opt=tf.train.GradientDescentOptimizer(lr)
        grads=opt.compute_gradients(total_loss)

    #apply gradients
    apply_gradient_op=opt.apply_gradients(grads,global_step=global_step)
    #This is the second part of `minimize()`. It returns an `Operation` that applies gradients.

    #add histogram
    for grad,var in grads:
        if grad is not None:
            tf.summary.histogram(var.op.name+'/gradients',grad)

    # Track the moving averages of all trainable variables.
    variable_averages = tf.train.ExponentialMovingAverage(
        MOVING_AVERAGE_DECAY, global_step)
    variables_averages_op = variable_averages.apply(tf.trainable_variables())

    with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
        train_op = tf.no_op(name='train')

    return train_op

先設立學習率，此處學習率是隨著迭代過程衰減的

【補4】tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None)

用於使學習率指數衰減，公式為：decayed_learning_rate = learning_rate *decay_rate ^ (global_step / decay_steps)

引數：

learning_rate:初始學習率 float

global_step:必須為負值用於衰減計算，這裡是整數變數，計算著已執行的訓練步驟數

decay_steps:必須為正值，此處為每次迭代經歷的batch數*每次衰減要經過的迭代數

staircase: 如果true，則衰減的學習率為離散的整數

返回衰減的學習率的值，然後用tf.summary.scalar()新增一個標量‘learning_rate’以便觀察

============================================================================================

_add_loss_summaries()

將所有loss計算滑動平均後的值儲存到collection 'losses'裡，並依次以scalar存入summary中

返回一個op用於得到losses的滑動平均值

def _add_loss_summaries(total_loss):
    """
    Add summaries for losses in CIFAR-10 model
    Generates moving average for all losses and associated summaries of visualizing the performnce of the network
    :param total_loss:Total loss from loss()
    :return:
    loss_averages_op: op for generating moving averages of losses
    """
    #計算moving average of all individual losses and the total loss
    #MovingAverage為滑動平均，計算方法：對於一個給定的數列，首先設定一個固定的值k，然後分別計算第1項到第k項，第2項到第k+1項，第3項到第k+2項的平均值，依次類推。
    loss_averages=tf.train.ExponentialMovingAverage(0.9,name='avg')
    losses=tf.get_collection('losses')
    loss_averages_op=loss_averages.apply(losses+[total_loss])

    #給每一個單獨的losses和total loss attach a scalar summary;do the same
    #for the averaged version of the losses
    for l in losses+[total_loss]:
        tf.summary.scalar(l.op.name+'(raw)',l)
        tf.summary.scalar(l.op.name,loss_averages.average(l))

    return loss_averages_op

【補6】loss_averages= tf.train.ExponentialMovingAverage()

這是一個創立了一個ExponentialMovingAverage類物件

當訓練一個模型時，儲存已訓引數的滑動平均值更好，可以得到更好的結果。此處主要使用了其apply（）方法，所以主要介紹這個方法

__init__(self, decay, num_updates=None, zero_debias=False, name='ExponentialMovingAverage')

apply(self, var_list=None)

Maintains moving averages of variables.方法添加了一個關於已訓變數的影子副本，而且添加了能保留變數的滑動平均值在副本中的op，這個op通常在每一步訓練步驟後

返回一個op，注意apply()方法可以被呼叫多次，每次有不同的lists of variables

【補7】 with tf.control_dependencies(control_inputs):

control_inputs: list of ops 或者tensors物件，而且這個list裡的物件必須在context 定義的那些操作之前完成，形成依賴關係！

此例中得到滑動平均op（loss_averages_op）後，與梯度下降形成依賴關係，先執行滑動平均更新loss,然後再以這個loss為目標函執行梯度下降。

【補8】tf.train.GradientDescentOptimizer()

這裡梯度下降也使用到了一個GradientDescentOptimizer類物件,用到的方法有

__init__(self, learning_rate, use_locking=False, name='GradientDescent')

建立一個新的梯度下降優化器

compute_gradients(self, loss, var_list=None, gate_gradients=1, aggregation_method=None, colocate_gradients_with_ops=False, grad_loss=None)

計算loss的梯度關於‘var_list’裡的變數，預設情況下，是圖中所有trainable variables,注意‘gradient’可以是tensor,也可以是None如果對於某一變數沒有梯度。 返回list of (gradient, variable)pairs, Variable永遠都存在，gradient can be None

apply_gradients(self, grads_and_vars, global_step=None, name=None)

Apply gradients to variables對變數執行梯度更新 返回一個執行梯度更新的op 最後對於每一個grads裡梯度和變數，建立histogram在summary中,然後創立一個對變數的滑動平均op 最後創立一個整體的train_op，與梯度更新op和變數滑動平均op形成依賴關係，使得執行train_op之前必須執行前面兩個op,所以會話中只需直接執行train_op就能執行另兩個op，至此，模型建立結束 三、訓練 全域性引數

from datetime import datetime
import time
import tensorflow as tf
from CIFAR10 import model_build
FLAGS=tf.app.flags.FLAGS

tf.app.flags.DEFINE_string('train_dir','E:/Python/tensorflow/CIFAR10',"""Directory
where to write event logs and checkpoint""")
tf.app.flags.DEFINE_integer('max_steps',100000,"""Number of batches to run.""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
                            """Whether to log device placement.""")
tf.app.flags.DEFINE_integer('log_frequency', 10,
                            """How often to log results to the console.""")

train函式

def train1():
    with tf.Graph().as_default():
        global_step=tf.contrib.framework.get_or_create_global_step()
        #use the default graph in the process in the context
        #global_step=tf.Variable(0,name='global_step',trainable=False)
        #獲取影象和標籤
        images,labels=model_build.distorted_inputs()

        #建立一個圖來計算神經元預測值，前向傳播
        logits=model_build.inference(images)

        #計算loss
        loss=model_build.loss(logits,labels)

        #建一個圖來來訓練一個Batch的樣本然後更新引數
        train_op=model_build.train(loss,global_step)
        #專門定義_LoggerHook類，在mon_sess這個對話中註冊
        class _LoggerHook(tf.train.SessionRunHook):
            """
            Logs loss and runtime.
            """
            def begin(self):
                self._step=-1
                self._start_time=time.time()

            def before_run(self,run_context):
                #Called before each call to run()
                #返回‘SessionRunArgs’物件意味著ops或者tensors去加入即將到來的run()，
                #這些ops和tensor回合之前的一起送入run()
                #run()的引數裡還可以包括你要feed的東西

                #run_context引數包括了即將到來的run()的資訊：原始的op和tensors
                #當該函式執行完，圖就確定了，就不能再加op了
                self._step+=1
                return tf.train.SessionRunArgs(loss) #Asks for loss value
            def after_run(self,run_context,run_values):
                #Called after eah call to run()
                #'run value' argument contains results of requested ops/tensors by'before_run'
                #the 'run_context' argument 與送入before_run的是一樣的
                #'run_context.request_stop()'can be called to stop the iteration
                if self._step % FLAGS.log_frequency==0:#當取了FLAGS.log_frequency個batches的時候
                    current_time=time.time()
                    duration=current_time-self._start_time
                    self._start_time=current_time

                    loss_value=run_values.results
                    examples_per_sec=FLAGS.log_frequency* FLAGS.batch_size/duration
                    sec_per_barch=float(duration/FLAGS.log_frequency)
                    format_str=('%s:step %d,loss=%.2f (%.1f examples/sec; %.3f' 'sec/batch')
                    print(format_str %(datetime.now(),self._step,loss_value,examples_per_sec,sec_per_barch))

        with tf.train.MonitoredTrainingSession(
            #set proper session intializer/restorer,it also creates hooks related to
            #checkpoint and summary saving
            checkpoint_dir=FLAGS.train_dir,
            hooks=[tf.train.StopAtStepHook(last_step=FLAGS.max_steps),tf.train.NanTensorHook(loss),
                   _LoggerHook()],
            config=tf.ConfigProto(
                log_device_placement=FLAGS.log_device_placement)) as mon_sess:
            while not mon_sess.should_stop():
                mon_sess.run(train_op)
                #此處表示在停止條件到達之前，迴圈執行train_op,更新模型引數

def main(argv=None):
            train1()

if __name__=='__main__':
            tf.app.run(main=main)

先用with tf.Graph().as_default()使得所有操作在預設圖下，用with框起來表明以下所有ops都被加到這個圖中如果你想建立一個新的執行緒，然後希望把這個新的執行緒新增到這個圖中，那麼你必須新增上“with g.as_default()“ 然後呼叫model_build裡面的輸入函式，inferencce函式還有loss函式，train函式，最終得到了train_op 然後定義了一個_LoggerHook()類物件，繼承了tf.train.SeesionRunHook類 【補9】tf.train.SeesionRunHook類 我大概理解就是這是一個會話懸停物件等待MonitoredSession.run()來執行它這個類有如下方法：

after_create_session(self,session,coord)

當新的tensorflow session被建立時呼叫該函式，當它被呼叫時，graph是固定的，ops不能被加入圖中這個方法還會在恢復一個wrapped session時被呼叫 Args:
session: A TensorFlow Session that has been created.
coord: A Coordinator object which keeps track of all threads.

after_run(self, run_context, run_values)

在每次呼叫完run()後被呼叫，‘run_values’引數包括了before_run()要求的ops/tensors的結果 'run_context'與送入before_run()中的一樣 Args:
run_context: A `SessionRunContext` object.
run_values: A SessionRunValues object. 這裡if條件當執行步數為FLAGS.log_frequency（10）的整數倍時，即每處理10個batches記錄當前時間，算出消耗的時間，因為在before_run()裡我們把loss加入到session.run()中，所以此處run_value.results就是loss的值，輸出結果

before_run(self, run_context)

在每次呼叫run()前被呼叫，你可以從這個函式返回一個‘SessionRunArgs’物件表明要加入即將要run()的call中的ops和tensors ‘run_context’引數就提供了即將run()call的資訊這裡執行步驟數+1，然後返回SessionRunArgs類物件，表明要把loss加入到session.run()

begin(self)

在使用session之前呼叫，hook可以改變graph通過加入新的ops，在begin()之後，圖就被固定，其他呼叫不能再改變圖這裡begin()裡面定義步驟數為-1，定義了一個起始時間 ======================================================================================= 最後，定義tf.train.MonitoredTrainingSession()類 【補10】 tf.train.MonitoredTrainingSession（master='', is_chief=True, checkpoint_dir=None, scaffold=None, hooks=None, chief_only_hooks=None, save_checkpoint_secs=600, save_summaries_steps=100, config=None）
建立一個MonitoredSession for training 對於chief,這個可以設定合適的session初始化/恢復器，它還可以建立與checkpoint和summary saving有關的hooks 對於worker,要等待chief去初始化/恢復session Args:
master: `String` the TensorFlow master to use. is_chief: If `True`,它會負責初始化、恢復正在進行的session， If `False`,它會等待chief去初始化或恢復session checkpoint_dir: A string. 儲存checkpoint的路徑
scaffold: A `Scaffold` used for gathering or building supportive ops. If not specified, a default one is created. It's used to finalize the graph.
hooks: Optional list of `SessionRunHook` objects. 此處設為[tf.train.StopAtStepHook(last_step=FLAGS.max_step),tf.train.NanTensorHook(loss),_LoggerHook()]

tf.train.StopAtStepHook()為會話懸停類物件，它的作用是監視並提出停在特定步驟的請求

這個hook會請求要不在一定步數後或者達到最後一步的步數時停止 其__init__(self, num_steps=None, last_step=None),所以此處將last_step設為最大步驟數，意思就是到達這個數，就停止

tf.train.NanTensorHook()會話懸停類物件

它的作用就是監視loss,當loss為Nan時停下

_LoggerHook()就是我們自己定義的會話懸停物件用來執行loss的計算，時間的記錄，列印等等

chief_only_hooks: list of `SessionRunHook` objects. Activate these hooks if `is_chief==True`, ignore otherwise.
save_checkpoint_secs: checkpoint儲存頻率，如果設為None，就不儲存
save_summaries_steps: summary寫入頻率
config: an instance of `tf.ConfigProto` proto used to configure the session. It's the `config` argument of constructor of `tf.Session`.

Returns:
A `MonitoredSession` object.
最後輸出結果：

Filling queue with 20000 CIFAR images before starting to train.This will take a few minutes.
2017-04-16 20:04:10.826531:step 0,loss=6.39 (25.3 examples/sec; 5.056sec/batch
2017-04-16 20:04:36.614833:step 10,loss=6.22 (49.6 examples/sec; 2.579sec/batch
2017-04-16 20:05:01.745663:step 20,loss=6.10 (50.9 examples/sec; 2.513sec/batch
2017-04-16 20:05:27.068144:step 30,loss=6.01 (50.5 examples/sec; 2.532sec/batch

因為我使用CPU，所以跟官網指南上給的GPU版本速度差別很大

深度學習小白——tensorflow(四)CIFAR-10例項

深度學習小白——tensorflow(四)CIFAR-10例項

深度學習小白——TensorFlow(一)簡介

深度學習小白——Tensorflow(二)卷積

深度學習小白——Tensorflow(三) 讀取資料

深度學習小白——卷積神經網路視覺化（二）

深度學習小白——神經網路5（引數更新）

深度學習小白——神經網路3（資料預處理，dropout，正則化）

Tensorflow實現CIFAR-10分類問題-詳解四cifar10_eval.py

王小草【深度學習】筆記第四彈--卷積神經網路與遷移學習

TensorFlow深度學習入門筆記（四）一些基本函數

問題集錄--新手入門深度學習，選擇TensorFlow 好嗎？

Ubuntu深度學習環境搭建 tensorflow+pytorch

Unity3d發布IOS(包含u3d自帶IAP內購)的流程-小白篇(四)-Xcode配置發布部分

大數據學習|小白學習大數據需要滿足這六個條件你就能學好大數據

Linux基礎入門---學習心得大資料學習|小白學習大資料需要滿足這六個條件你就能學好大資料

21個專案玩轉深度學習：基於TensorFlow的實踐詳解03—打造自己的影象識別模型

小白學python之類與例項_學習筆記

分享《深度學習、優化與識別》PDF+《深度學習原理與TensorFlow實踐》PDF

分享《21個項目玩轉深度學習：基於TensorFlow的實踐詳解》PDF+源代碼

進擊的JavaScript小白（四）——函式

深度學習小白——tensorflow(四)CIFAR-10例項

相關推薦