1. 程式人生 > >深度學習小白——tensorflow(四)CIFAR-10例項

深度學習小白——tensorflow(四)CIFAR-10例項

一、資料讀取

因為之前寫過,見http://blog.csdn.net/margretwg/article/details/70168256,這裡就不重複了

二、模型建立

全域性引數

import os
import re
import sys
import tarfile
import tensorflow as tf
import CIFAR10.CIFAR_input as input
FLAGS=tf.app.flags.FLAGS

#模型引數
tf.app.flags.DEFINE_integer('batch_size', 128,
                            """Number of images to process in a batch.""")
tf.app.flags.DEFINE_string('data_dir', 'E:/Python/tensorflow/CIFAR10',
                           """Path to the CIFAR-10 data directory.""")
tf.app.flags.DEFINE_boolean('use_fp16', False,
                            """Train the model using fp16.""")

#全域性變數
IMAGE_SIZE=input.IMAGE_SIZE
NUM_CLASSES=input.NUM_CLASSES
NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN=input.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
NUM_EXAMPLES_PER_EPOCH_FOR_EVAL=input.NUM_EXAMPLES_PER_EPOCH_FOR_EVAL

#訓練過程中的常量
MOVING_AVERAGE_DECAY=0.9999
NUM_EPOCH_PER_DECAY=350.0 #epochs after which learning rate decays
LEARNING_RATE_DECAY_FACTOR=0.1 #學習率衰減因子
INITIAL_LEARNING_RATE=0.1


2.1 模型預測inference()

主要有:conv1-->pool1-->norm1-->conv2-->norm2-->pool2-->local3-->local4-->softmax_linear

該模組返回的是(128,10)的張量

def inference(images):
    """
    建立CIFAR-10模型
    :param images: Images來自distorted_inputs()或inputs()
    :return:
    Logits神經元
    """
    #conv1
    with tf.variable_scope('conv1')as scope:
        kernel=_variable_with_weight_decay('weights',shape=[5,5,3,64],stddev=5e-2,wd=0.0)
        conv=tf.nn.conv2d(images,kernel,[1,1,1,1],padding='SAME')#卷積操作
        biases=_variable_on_cpu('biases',[64],tf.constant_initializer(0.0))
        pre_activation=tf.nn.bias_add(conv,biases)# WX+b
        conv1=tf.nn.relu(pre_activation,name=scope.name)
        _activation_summary(conv1)

    #pool1
    pool1=tf.nn.max_pool(conv1,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool1')


    #norm1
    norm1=tf.nn.lrn(pool1,4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm1')

    #conv2
    with tf.variable_scope('conv2') as scope:
        kernel=_variable_with_weight_decay('weights',shape=[5,5,64,64],stddev=5e-2,wd=0.0)
        conv=tf.nn.conv2d(norm1,kernel,[1,1,1,1],padding='SAME')
        biases=_variable_on_cpu('biases',[64],tf.constant_initializer(0.1))
        pre_activation=tf.nn.bias_add(conv,biases)
        conv2=tf.nn.relu(pre_activation,name=scope.name)
        _activation_summary(conv2)

     #norm2
    norm2=tf.nn.lrn(conv2,4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm2')

    #pool2
    pool2=tf.nn.max_pool(norm2,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool2')

    #local3
    with tf.variable_scope('local3')as scope:
        #Move everything into depth so we can perform a single matrix multiply
        reshape=tf.reshape(pool2,[FLAGS.batch_size,-1])
        dim=reshape.get_shape()[1].value
        weights=_variable_with_weight_decay('weights',shape=[dim,384],stddev=0.04,wd=0.004)
        biases=_variable_on_cpu('biases',[384],tf.constant_initializer(0.1))
        local3=tf.nn.relu(tf.matmul(reshape,weights)+biases,name=scope.name)
        _activation_summary(local3)

     #local4
    with tf.variable_scope('local4') as scope:
        weights = _variable_with_weight_decay('weights', shape=[384, 192],
                                              stddev=0.04, wd=0.004)
        biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
        local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
        _activation_summary(local4)

    with tf.variable_scope('softmax_linear') as scope:
        weights=_variable_with_weight_decay('weights',[192,NUM_CLASSES],stddev=1/192.0,wd=0.0)
        biases=_variable_on_cpu('biases',[NUM_CLASSES],tf.constant_initializer(0.0))
        softmax_linear=tf.add(tf.matmul(local4,weights),biases,name=scope.name)
        _activation_summary(softmax_linear)

    return softmax_linear #


其中,_variable_with_weight_decay()函式用於初始化weights,並且這裡帶一個衰減係數wd,用於計算權重衰減loss,加入到collection中,方便最後計算total_loss

def _variable_with_weight_decay(name,shape,stddev,wd):
    """
    Helper to create an initialized Variable with weight decay

    這裡變數被初始化為截斷正態分佈
    :param stddev:標準差
    :param wd: add L2 loss weight decay multiplied by this float. If None, weight decay is not added for this Variable
    :return:
    Variable tensor
    """

    dtype=tf.float16 if FLAGS.use_fp16 else tf.float32
    var=_variable_on_cpu(name,shape,tf.truncated_normal_initializer(stddev=stddev,dtype=dtype))
    if wd is not None:
      weight_decay=tf.multiply(tf.nn.l2_loss(var),wd,name='weight_loss')
      tf.add_to_collection('losses',weight_decay)

    return var
_variable_on_cpu()函式即在CPU上建立初始化了的name=name,shape=shape的變數
def _variable_on_cpu(name,shape,initializer):
    """
    Helper to create a Variable stored oon CPU memory
    :param name: 變數名
    :param shape: lists of ints
    :param initializer: 初始化變數值
    :return:
    Variable Tensor
    """
    with tf.device('/cpu:0'):
        dtype=tf.float16 if FLAGS.use_fp16 else tf.float32
        var=tf.get_variable(name,shape,initializer=initializer,dtype=dtype)
        return var


[補1——collection]:

tensorflow 的collection提供了一個全域性的儲存機制,不會受到變數名生存空間的影響,一處儲存,到處可取

(1)tf.Graph.add_to_collection(name,value) 向collection中存資料

collection不是set,所以一個'name'下可以存很多值, tf.add_to_collection(name,value)是給預設圖使用的

(2)tf.Graph.get_collection(name,scope=None)

返回名字為name的list of values in the collection,scope不為None的時候,the resulting list is filtered to include only items whose name attribute matches using re.math,items without a name attribute are never returned.因此此例沒有用這個引數,所以我具體也不太清楚這個scope是幹嘛的··以後碰到再補充吧

2.2 算loss

對所有學習變數應用權重衰減損失。模型的目標函式是求交叉熵損失和所有權重衰減項的和

def loss(logits,labels):
    """
    Add L2loss to all the trainable variables
    Add summary for "loss" and "loss/avg"
    :param logits: logits from inference()
    :param labels: labels from distorted_inputs or inputs() 1-D tensor of shape[batch_size]

    :return: loss tensor of type float
    """

    #計算平均交叉熵損失對一個batch
    labels=tf.cast(labels,tf.int64)
    cross_entropy=tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=logits,name="cross_entropy_per_exapmle")
    cross_entropy_mean=tf.reduce_mean(cross_entropy,name='cross_entropy')
    tf.add_to_collection('losses',cross_entropy_mean)

    #總共的損失應該是交叉熵損失加上權重衰減項(L2 LOSS)
    #權重的二範數值剛剛也加到了'losses'的collection裡,這裡的tf.add_n()就是將loss和剛剛的weights的二範數值對應相加
    return tf.add_n(tf.get_collection('losses'),name='total_loss')
【補2】tf.nn.sparse_softmax_cross_entropy_with_logits(_sentinel=None,labels=None,logits=None,name=None)
計算稀疏softmax交叉熵between labels和logits,該函式針對那種每一個樣本對應一個離散的獨立的分類任務,如CIFAR-10,也就是說soft classes 在這裡是不允許的,label 向量必須提供一個單一具體的index對於每一行(樣本)logits。對於soft softmax分類,用tf.nn.softmax_cross_entropy_with_logtis()

返回一個與‘labels’一樣大小的tensor,裡面是每個樣本的loss

==================================================================================================

【補3】

tf.add_n(inputs,name=None)

Add all input tensors element-wise

返回與inputs裡面元素大小一樣的tensor

此處將collection 裡面叫‘losses’的元素list全加起來,就是把剛算的平均loss和所有不同層的weights的二範數值加起來得到total_loss

2.3 更新引數/train_op

新增一些操作使得目標函式最小化,這些操作包括計算梯度、更新學習變數, 函式最終會返回一個用以對一批影象執行所有計算的操作步驟(train_op),以便訓練並更新模型。

def train(total_loss,global_step):
    """
    Train CIFAR-10 model
    設立優化器,並對於所有可訓練變數新增滑動平均
    :param total_loss:Total loss from loss()
    :param global_step:integer Varibale conunting the number of trainnig steps processed
    :return: train_op:op for training
    """
    #Variables that affect learning rate
    num_batches_per_epoch=NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN/FLAGS.batch_size
    decay_steps=int(num_batches_per_epoch* NUM_EPOCH_PER_DECAY)

    #decay the learning rate exponentially based on the number of steps
    #隨著迭代過程衰減學習率
    lr=tf.train.exponential_decay(INITIAL_LEARNING_RATE,global_step,decay_steps,LEARNING_RATE_DECAY_FACTOR,staircase=True)
    tf.summary.scalar('learning_rate',lr)

    #滑動平均 of all losses and associated summaries
    loss_averages_op=_add_loss_summaries(total_loss)

    #計算梯度
    with tf.control_dependencies([loss_averages_op]):
        opt=tf.train.GradientDescentOptimizer(lr)
        grads=opt.compute_gradients(total_loss)

    #apply gradients
    apply_gradient_op=opt.apply_gradients(grads,global_step=global_step)
    #This is the second part of `minimize()`. It returns an `Operation` that applies gradients.

    #add histogram
    for grad,var in grads:
        if grad is not None:
            tf.summary.histogram(var.op.name+'/gradients',grad)

    # Track the moving averages of all trainable variables.
    variable_averages = tf.train.ExponentialMovingAverage(
        MOVING_AVERAGE_DECAY, global_step)
    variables_averages_op = variable_averages.apply(tf.trainable_variables())

    with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
        train_op = tf.no_op(name='train')

    return train_op


先設立學習率,此處學習率是隨著迭代過程衰減的

【補4】tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None)

用於使學習率指數衰減,公式為:decayed_learning_rate = learning_rate *decay_rate ^ (global_step / decay_steps)

引數:

learning_rate:初始學習率 float

global_step:必須為負值用於衰減計算,這裡是整數變數,計算著已執行的訓練步驟數

decay_steps:必須為正值,此處為每次迭代經歷的batch數*每次衰減要經過的迭代數

staircase: 如果true,則衰減的學習率為離散的整數

返回衰減的學習率的值,然後用tf.summary.scalar()新增一個標量‘learning_rate’以便觀察

============================================================================================

_add_loss_summaries()

將所有loss計算滑動平均後的值儲存到collection 'losses'裡,並依次以scalar存入summary中

返回一個op用於得到losses的滑動平均值

def _add_loss_summaries(total_loss):
    """
    Add summaries for losses in CIFAR-10 model
    Generates moving average for all losses and associated summaries of visualizing the performnce of the network
    :param total_loss:Total loss from loss()
    :return:
    loss_averages_op: op for generating moving averages of losses
    """
    #計算moving average of all individual losses and the total loss
    #MovingAverage為滑動平均,計算方法:對於一個給定的數列,首先設定一個固定的值k,然後分別計算第1項到第k項,第2項到第k+1項,第3項到第k+2項的平均值,依次類推。
    loss_averages=tf.train.ExponentialMovingAverage(0.9,name='avg')
    losses=tf.get_collection('losses')
    loss_averages_op=loss_averages.apply(losses+[total_loss])

    #給每一個單獨的losses和total loss attach a scalar summary;do the same
    #for the averaged version of the losses
    for l in losses+[total_loss]:
        tf.summary.scalar(l.op.name+'(raw)',l)
        tf.summary.scalar(l.op.name,loss_averages.average(l))

    return loss_averages_op


【補6】loss_averages= tf.train.ExponentialMovingAverage()

這是一個創立了一個ExponentialMovingAverage類物件

當訓練一個模型時,儲存已訓引數的滑動平均值更好,可以得到更好的結果。此處主要使用了其apply()方法,所以主要介紹這個方法

  • __init__(self, decay, num_updates=None, zero_debias=False, name='ExponentialMovingAverage')
  • apply(self, var_list=None)

Maintains moving averages of variables.方法添加了一個關於已訓變數的影子副本,而且添加了能保留變數的滑動平均值在副本中的op,這個op通常在每一步訓練步驟後

返回一個op,注意apply()方法可以被呼叫多次,每次有不同的lists of variables

【補7】 with tf.control_dependencies(control_inputs):

control_inputs: list of  ops 或者tensors物件,而且這個list裡的物件必須在context 定義的那些操作之前完成,形成依賴關係

此例中得到滑動平均op(loss_averages_op)後,與梯度下降形成依賴關係,先執行滑動平均更新loss,然後再以這個loss為目標函執行梯度下降。

【補8】tf.train.GradientDescentOptimizer()

這裡梯度下降也使用到了一個GradientDescentOptimizer類物件,用到的方法有

  • __init__(self, learning_rate, use_locking=False, name='GradientDescent')
建立一個新的梯度下降優化器
  • compute_gradients(self, loss, var_list=None, gate_gradients=1, aggregation_method=None, colocate_gradients_with_ops=False, grad_loss=None)
計算loss的梯度關於‘var_list’裡的變數,預設情況下,是圖中所有trainable variables,注意‘gradient’可以是tensor,也可以是None如果對於某一變數沒有梯度。 返回list of (gradient, variable)pairs, Variable永遠都存在,gradient can be None
  • apply_gradients(self, grads_and_vars, global_step=None, name=None)
Apply gradients to variables對變數執行梯度更新 返回一個執行梯度更新的op 最後對於每一個grads裡梯度和變數,建立histogram在summary中,然後創立一個對變數的滑動平均op 最後創立一個整體的train_op,與梯度更新op和變數滑動平均op形成依賴關係,使得執行train_op之前必須執行前面兩個op,所以會話中只需直接執行train_op就能執行另兩個op,至此,模型建立結束 三、訓 練 全域性引數
from datetime import datetime
import time
import tensorflow as tf
from CIFAR10 import model_build
FLAGS=tf.app.flags.FLAGS

tf.app.flags.DEFINE_string('train_dir','E:/Python/tensorflow/CIFAR10',"""Directory
where to write event logs and checkpoint""")
tf.app.flags.DEFINE_integer('max_steps',100000,"""Number of batches to run.""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
                            """Whether to log device placement.""")
tf.app.flags.DEFINE_integer('log_frequency', 10,
                            """How often to log results to the console.""")

train函式
def train1():
    with tf.Graph().as_default():
        global_step=tf.contrib.framework.get_or_create_global_step()
        #use the default graph in the process in the context
        #global_step=tf.Variable(0,name='global_step',trainable=False)
        #獲取影象和標籤
        images,labels=model_build.distorted_inputs()

        #建立一個圖來計算神經元預測值,前向傳播
        logits=model_build.inference(images)

        #計算loss
        loss=model_build.loss(logits,labels)

        #建一個圖來來訓練一個Batch的樣本然後更新引數
        train_op=model_build.train(loss,global_step)
        #專門定義_LoggerHook類,在mon_sess這個對話中註冊
        class _LoggerHook(tf.train.SessionRunHook):
            """
            Logs loss and runtime.
            """
            def begin(self):
                self._step=-1
                self._start_time=time.time()

            def before_run(self,run_context):
                #Called before each call to run()
                #返回‘SessionRunArgs’物件意味著ops或者tensors去加入即將到來的run(),
                #這些ops和tensor回合之前的一起送入run()
                #run()的引數裡還可以包括你要feed的東西

                #run_context引數包括了即將到來的run()的資訊:原始的op和tensors
                #當該函式執行完,圖就確定了,就不能再加op了
                self._step+=1
                return tf.train.SessionRunArgs(loss) #Asks for loss value
            def after_run(self,run_context,run_values):
                #Called after eah call to run()
                #'run value' argument contains results of requested ops/tensors by'before_run'
                #the 'run_context' argument 與送入before_run的是一樣的
                #'run_context.request_stop()'can be called to stop the iteration
                if self._step % FLAGS.log_frequency==0:#當取了FLAGS.log_frequency個batches的時候
                    current_time=time.time()
                    duration=current_time-self._start_time
                    self._start_time=current_time

                    loss_value=run_values.results
                    examples_per_sec=FLAGS.log_frequency* FLAGS.batch_size/duration
                    sec_per_barch=float(duration/FLAGS.log_frequency)
                    format_str=('%s:step %d,loss=%.2f (%.1f examples/sec; %.3f' 'sec/batch')
                    print(format_str %(datetime.now(),self._step,loss_value,examples_per_sec,sec_per_barch))

        with tf.train.MonitoredTrainingSession(
            #set proper session intializer/restorer,it also creates hooks related to
            #checkpoint and summary saving
            checkpoint_dir=FLAGS.train_dir,
            hooks=[tf.train.StopAtStepHook(last_step=FLAGS.max_steps),tf.train.NanTensorHook(loss),
                   _LoggerHook()],
            config=tf.ConfigProto(
                log_device_placement=FLAGS.log_device_placement)) as mon_sess:
            while not mon_sess.should_stop():
                mon_sess.run(train_op)
                #此處表示在停止條件到達之前,迴圈執行train_op,更新模型引數

def main(argv=None):
            train1()

if __name__=='__main__':
            tf.app.run(main=main)


先用with tf.Graph().as_default()使得所有操作在預設圖下,用with框起來表明以下所有ops都被加到這個圖中 如果你想建立一個新的執行緒,然後希望把這個新的執行緒新增到這個圖中,那麼你必須新增上“with g.as_default()“ 然後呼叫model_build裡面的輸入函式,inferencce函式還有loss函式,train函式,最終得到了train_op 然後定義了一個_LoggerHook()類物件,繼承了tf.train.SeesionRunHook類 【補9】tf.train.SeesionRunHook類 我大概理解就是這是一個會話懸停物件等待MonitoredSession.run()來執行它 這個類有如下方法:
  • after_create_session(self,session,coord)
當新的tensorflow session被建立時呼叫該函式,當它被呼叫時,graph是固定的,ops不能被加入圖中 這個方法還會在恢復一個wrapped session時被呼叫 Args:
       session: A TensorFlow Session that has been created.
        coord: A Coordinator object which keeps track of all threads.
  • after_run(self, run_context, run_values)
在每次呼叫完run()後被呼叫,‘run_values’引數包括了before_run()要求的ops/tensors的結果 'run_context'與送入before_run()中的一樣 Args:
         run_context: A `SessionRunContext` object.
         run_values: A SessionRunValues object.
   這裡if條件當執行步數為FLAGS.log_frequency(10)的整數倍時,即每處理10個batches記錄當前時間,算出消耗的時間,因為在before_run()裡我們把loss加入到session.run()中,所以此處run_value.results就是loss的值,輸出結果
  • before_run(self, run_context)
在每次呼叫run()前被呼叫,你可以從這個函式返回一個‘SessionRunArgs’物件 表明要加入即將要run()的call中的ops和tensors ‘run_context’引數就提供了即將run()call的資訊 這裡執行步驟數+1,然後返回SessionRunArgs類物件,表明要把loss加入到session.run()
  • begin(self)
在使用session之前呼叫,hook可以改變graph通過加入新的ops,在begin()之後,圖就被固定,其他呼叫不能再改變圖     這裡begin()裡面定義步驟數為-1,定義了一個起始時間 ======================================================================================= 最後,定義tf.train.MonitoredTrainingSession()類 【補10】 tf.train.MonitoredTrainingSession(master='', is_chief=True, checkpoint_dir=None, scaffold=None, hooks=None, chief_only_hooks=None, save_checkpoint_secs=600, save_summaries_steps=100, config=None
建立一個MonitoredSession for training 對於chief,這個可以設定合適的session初始化/恢復器,它還可以建立與checkpoint和summary saving有關的hooks 對於worker,要等待chief去初始化/恢復session Args:
master:
`String` the TensorFlow master to use.
is_chief: If `True`,它會負責初始化、恢復正在進行的session, If `False`,它會等待chief去初始化或恢復session checkpoint_dir: A string. 儲存checkpoint的路徑
scaffold: A `Scaffold` used for gathering or building supportive ops. If not specified, a default one is created. It's used to finalize the graph.

hooks: Optional list of `SessionRunHook` objects. 
此處設為[tf.train.StopAtStepHook(last_step=FLAGS.max_step),tf.train.NanTensorHook(loss),_LoggerHook()]
  • tf.train.StopAtStepHook()會話懸停類物件,它的作用是監視並提出停在特定步驟的請求
這個hook會請求要不在一定步數後或者達到最後一步的步數時停止 __init__(self, num_steps=None, last_step=None),所以此處將last_step設為最大步驟數,意思就是到達這個數,就停
  • tf.train.NanTensorHook()會話懸停類物件
它的作用就是監視loss,當loss為Nan時停下
  • _LoggerHook()就是我們自己定義的會話懸停物件用來執行loss的計算,時間的記錄,列印等等

chief_only_hooks: list of `SessionRunHook` objects. Activate these hooks if `is_chief==True`, ignore otherwise.
save_checkpoint_secs: checkpoint儲存頻率,如果設為None,就不儲存
save_summaries_steps: summary寫入頻率

config:
an instance of `tf.ConfigProto` proto used to configure the session. It's the `config` argument of constructor of `tf.Session`.
    
    Returns:
      A `MonitoredSession` object.
最後輸出結果:
Filling queue with 20000 CIFAR images before starting to train.This will take a few minutes.
2017-04-16 20:04:10.826531:step 0,loss=6.39 (25.3 examples/sec; 5.056sec/batch
2017-04-16 20:04:36.614833:step 10,loss=6.22 (49.6 examples/sec; 2.579sec/batch
2017-04-16 20:05:01.745663:step 20,loss=6.10 (50.9 examples/sec; 2.513sec/batch
2017-04-16 20:05:27.068144:step 30,loss=6.01 (50.5 examples/sec; 2.532sec/batch
因為我使用CPU,所以跟官網指南上給的GPU版本速度差別很大