深度學習小白——tensorflow(四)CIFAR-10例項
一、資料讀取
因為之前寫過,見http://blog.csdn.net/margretwg/article/details/70168256,這裡就不重複了
二、模型建立
全域性引數
import os import re import sys import tarfile import tensorflow as tf import CIFAR10.CIFAR_input as input FLAGS=tf.app.flags.FLAGS #模型引數 tf.app.flags.DEFINE_integer('batch_size', 128, """Number of images to process in a batch.""") tf.app.flags.DEFINE_string('data_dir', 'E:/Python/tensorflow/CIFAR10', """Path to the CIFAR-10 data directory.""") tf.app.flags.DEFINE_boolean('use_fp16', False, """Train the model using fp16.""") #全域性變數 IMAGE_SIZE=input.IMAGE_SIZE NUM_CLASSES=input.NUM_CLASSES NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN=input.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN NUM_EXAMPLES_PER_EPOCH_FOR_EVAL=input.NUM_EXAMPLES_PER_EPOCH_FOR_EVAL #訓練過程中的常量 MOVING_AVERAGE_DECAY=0.9999 NUM_EPOCH_PER_DECAY=350.0 #epochs after which learning rate decays LEARNING_RATE_DECAY_FACTOR=0.1 #學習率衰減因子 INITIAL_LEARNING_RATE=0.1
2.1 模型預測inference()
主要有:conv1-->pool1-->norm1-->conv2-->norm2-->pool2-->local3-->local4-->softmax_linear
該模組返回的是(128,10)的張量
def inference(images): """ 建立CIFAR-10模型 :param images: Images來自distorted_inputs()或inputs() :return: Logits神經元 """ #conv1 with tf.variable_scope('conv1')as scope: kernel=_variable_with_weight_decay('weights',shape=[5,5,3,64],stddev=5e-2,wd=0.0) conv=tf.nn.conv2d(images,kernel,[1,1,1,1],padding='SAME')#卷積操作 biases=_variable_on_cpu('biases',[64],tf.constant_initializer(0.0)) pre_activation=tf.nn.bias_add(conv,biases)# WX+b conv1=tf.nn.relu(pre_activation,name=scope.name) _activation_summary(conv1) #pool1 pool1=tf.nn.max_pool(conv1,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool1') #norm1 norm1=tf.nn.lrn(pool1,4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm1') #conv2 with tf.variable_scope('conv2') as scope: kernel=_variable_with_weight_decay('weights',shape=[5,5,64,64],stddev=5e-2,wd=0.0) conv=tf.nn.conv2d(norm1,kernel,[1,1,1,1],padding='SAME') biases=_variable_on_cpu('biases',[64],tf.constant_initializer(0.1)) pre_activation=tf.nn.bias_add(conv,biases) conv2=tf.nn.relu(pre_activation,name=scope.name) _activation_summary(conv2) #norm2 norm2=tf.nn.lrn(conv2,4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm2') #pool2 pool2=tf.nn.max_pool(norm2,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool2') #local3 with tf.variable_scope('local3')as scope: #Move everything into depth so we can perform a single matrix multiply reshape=tf.reshape(pool2,[FLAGS.batch_size,-1]) dim=reshape.get_shape()[1].value weights=_variable_with_weight_decay('weights',shape=[dim,384],stddev=0.04,wd=0.004) biases=_variable_on_cpu('biases',[384],tf.constant_initializer(0.1)) local3=tf.nn.relu(tf.matmul(reshape,weights)+biases,name=scope.name) _activation_summary(local3) #local4 with tf.variable_scope('local4') as scope: weights = _variable_with_weight_decay('weights', shape=[384, 192], stddev=0.04, wd=0.004) biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1)) local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name) _activation_summary(local4) with tf.variable_scope('softmax_linear') as scope: weights=_variable_with_weight_decay('weights',[192,NUM_CLASSES],stddev=1/192.0,wd=0.0) biases=_variable_on_cpu('biases',[NUM_CLASSES],tf.constant_initializer(0.0)) softmax_linear=tf.add(tf.matmul(local4,weights),biases,name=scope.name) _activation_summary(softmax_linear) return softmax_linear #
其中,_variable_with_weight_decay()函式用於初始化weights,並且這裡帶一個衰減係數wd,用於計算權重衰減loss,加入到collection中,方便最後計算total_loss
_variable_on_cpu()函式即在CPU上建立初始化了的name=name,shape=shape的變數def _variable_with_weight_decay(name,shape,stddev,wd): """ Helper to create an initialized Variable with weight decay 這裡變數被初始化為截斷正態分佈 :param stddev:標準差 :param wd: add L2 loss weight decay multiplied by this float. If None, weight decay is not added for this Variable :return: Variable tensor """ dtype=tf.float16 if FLAGS.use_fp16 else tf.float32 var=_variable_on_cpu(name,shape,tf.truncated_normal_initializer(stddev=stddev,dtype=dtype)) if wd is not None: weight_decay=tf.multiply(tf.nn.l2_loss(var),wd,name='weight_loss') tf.add_to_collection('losses',weight_decay) return var
def _variable_on_cpu(name,shape,initializer):
"""
Helper to create a Variable stored oon CPU memory
:param name: 變數名
:param shape: lists of ints
:param initializer: 初始化變數值
:return:
Variable Tensor
"""
with tf.device('/cpu:0'):
dtype=tf.float16 if FLAGS.use_fp16 else tf.float32
var=tf.get_variable(name,shape,initializer=initializer,dtype=dtype)
return var
[補1——collection]:
tensorflow 的collection提供了一個全域性的儲存機制,不會受到變數名生存空間的影響,一處儲存,到處可取
(1)tf.Graph.add_to_collection(name,value) 向collection中存資料
collection不是set,所以一個'name'下可以存很多值, tf.add_to_collection(name,value)是給預設圖使用的
(2)tf.Graph.get_collection(name,scope=None)
返回名字為name的list of values in the collection,scope不為None的時候,the resulting list is filtered to include only items whose name attribute matches using re.math,items without a name attribute are never returned.因此此例沒有用這個引數,所以我具體也不太清楚這個scope是幹嘛的··以後碰到再補充吧
2.2 算loss
對所有學習變數應用權重衰減損失。模型的目標函式是求交叉熵損失和所有權重衰減項的和
def loss(logits,labels):
"""
Add L2loss to all the trainable variables
Add summary for "loss" and "loss/avg"
:param logits: logits from inference()
:param labels: labels from distorted_inputs or inputs() 1-D tensor of shape[batch_size]
:return: loss tensor of type float
"""
#計算平均交叉熵損失對一個batch
labels=tf.cast(labels,tf.int64)
cross_entropy=tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,logits=logits,name="cross_entropy_per_exapmle")
cross_entropy_mean=tf.reduce_mean(cross_entropy,name='cross_entropy')
tf.add_to_collection('losses',cross_entropy_mean)
#總共的損失應該是交叉熵損失加上權重衰減項(L2 LOSS)
#權重的二範數值剛剛也加到了'losses'的collection裡,這裡的tf.add_n()就是將loss和剛剛的weights的二範數值對應相加
return tf.add_n(tf.get_collection('losses'),name='total_loss')
【補2】tf.nn.sparse_softmax_cross_entropy_with_logits(_sentinel=None,labels=None,logits=None,name=None)計算稀疏softmax交叉熵between labels和logits,該函式針對那種每一個樣本對應一個離散的獨立的分類任務,如CIFAR-10,也就是說soft classes 在這裡是不允許的,label 向量必須提供一個單一具體的index對於每一行(樣本)logits。對於soft softmax分類,用tf.nn.softmax_cross_entropy_with_logtis()
返回一個與‘labels’一樣大小的tensor,裡面是每個樣本的loss
==================================================================================================
【補3】
tf.add_n(inputs,name=None)
Add all input tensors element-wise
返回與inputs裡面元素大小一樣的tensor
此處將collection 裡面叫‘losses’的元素list全加起來,就是把剛算的平均loss和所有不同層的weights的二範數值加起來得到total_loss
2.3 更新引數/train_op
新增一些操作使得目標函式最小化,這些操作包括計算梯度、更新學習變數, 函式最終會返回一個用以對一批影象執行所有計算的操作步驟(train_op),以便訓練並更新模型。
def train(total_loss,global_step):
"""
Train CIFAR-10 model
設立優化器,並對於所有可訓練變數新增滑動平均
:param total_loss:Total loss from loss()
:param global_step:integer Varibale conunting the number of trainnig steps processed
:return: train_op:op for training
"""
#Variables that affect learning rate
num_batches_per_epoch=NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN/FLAGS.batch_size
decay_steps=int(num_batches_per_epoch* NUM_EPOCH_PER_DECAY)
#decay the learning rate exponentially based on the number of steps
#隨著迭代過程衰減學習率
lr=tf.train.exponential_decay(INITIAL_LEARNING_RATE,global_step,decay_steps,LEARNING_RATE_DECAY_FACTOR,staircase=True)
tf.summary.scalar('learning_rate',lr)
#滑動平均 of all losses and associated summaries
loss_averages_op=_add_loss_summaries(total_loss)
#計算梯度
with tf.control_dependencies([loss_averages_op]):
opt=tf.train.GradientDescentOptimizer(lr)
grads=opt.compute_gradients(total_loss)
#apply gradients
apply_gradient_op=opt.apply_gradients(grads,global_step=global_step)
#This is the second part of `minimize()`. It returns an `Operation` that applies gradients.
#add histogram
for grad,var in grads:
if grad is not None:
tf.summary.histogram(var.op.name+'/gradients',grad)
# Track the moving averages of all trainable variables.
variable_averages = tf.train.ExponentialMovingAverage(
MOVING_AVERAGE_DECAY, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())
with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
train_op = tf.no_op(name='train')
return train_op
先設立學習率,此處學習率是隨著迭代過程衰減的
【補4】tf.train.exponential_decay(learning_rate, global_step, decay_steps, decay_rate, staircase=False, name=None)
用於使學習率指數衰減,公式為:decayed_learning_rate = learning_rate *decay_rate ^ (global_step / decay_steps)
引數:
learning_rate:初始學習率 float
global_step:必須為負值用於衰減計算,這裡是整數變數,計算著已執行的訓練步驟數
decay_steps:必須為正值,此處為每次迭代經歷的batch數*每次衰減要經過的迭代數
staircase: 如果true,則衰減的學習率為離散的整數
返回衰減的學習率的值,然後用tf.summary.scalar()新增一個標量‘learning_rate’以便觀察
============================================================================================
_add_loss_summaries()
將所有loss計算滑動平均後的值儲存到collection 'losses'裡,並依次以scalar存入summary中
返回一個op用於得到losses的滑動平均值
def _add_loss_summaries(total_loss):
"""
Add summaries for losses in CIFAR-10 model
Generates moving average for all losses and associated summaries of visualizing the performnce of the network
:param total_loss:Total loss from loss()
:return:
loss_averages_op: op for generating moving averages of losses
"""
#計算moving average of all individual losses and the total loss
#MovingAverage為滑動平均,計算方法:對於一個給定的數列,首先設定一個固定的值k,然後分別計算第1項到第k項,第2項到第k+1項,第3項到第k+2項的平均值,依次類推。
loss_averages=tf.train.ExponentialMovingAverage(0.9,name='avg')
losses=tf.get_collection('losses')
loss_averages_op=loss_averages.apply(losses+[total_loss])
#給每一個單獨的losses和total loss attach a scalar summary;do the same
#for the averaged version of the losses
for l in losses+[total_loss]:
tf.summary.scalar(l.op.name+'(raw)',l)
tf.summary.scalar(l.op.name,loss_averages.average(l))
return loss_averages_op
【補6】loss_averages= tf.train.ExponentialMovingAverage()
這是一個創立了一個ExponentialMovingAverage類物件
當訓練一個模型時,儲存已訓引數的滑動平均值更好,可以得到更好的結果。此處主要使用了其apply()方法,所以主要介紹這個方法
- __init__(self, decay, num_updates=None, zero_debias=False, name='ExponentialMovingAverage')
- apply(self, var_list=None)
Maintains moving averages of variables.方法添加了一個關於已訓變數的影子副本,而且添加了能保留變數的滑動平均值在副本中的op,這個op通常在每一步訓練步驟後
返回一個op,注意apply()方法可以被呼叫多次,每次有不同的lists of variables
【補7】 with tf.control_dependencies(control_inputs):
control_inputs: list of ops 或者tensors物件,而且這個list裡的物件必須在context 定義的那些操作之前完成,形成依賴關係!
此例中得到滑動平均op(loss_averages_op)後,與梯度下降形成依賴關係,先執行滑動平均更新loss,然後再以這個loss為目標函執行梯度下降。
【補8】tf.train.GradientDescentOptimizer()
這裡梯度下降也使用到了一個GradientDescentOptimizer類物件,用到的方法有
- __init__(self, learning_rate, use_locking=False, name='GradientDescent')
- compute_gradients(self, loss, var_list=None, gate_gradients=1, aggregation_method=None, colocate_gradients_with_ops=False, grad_loss=None)
- apply_gradients(self, grads_and_vars, global_step=None, name=None)
from datetime import datetime
import time
import tensorflow as tf
from CIFAR10 import model_build
FLAGS=tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('train_dir','E:/Python/tensorflow/CIFAR10',"""Directory
where to write event logs and checkpoint""")
tf.app.flags.DEFINE_integer('max_steps',100000,"""Number of batches to run.""")
tf.app.flags.DEFINE_boolean('log_device_placement', False,
"""Whether to log device placement.""")
tf.app.flags.DEFINE_integer('log_frequency', 10,
"""How often to log results to the console.""")
train函式
def train1():
with tf.Graph().as_default():
global_step=tf.contrib.framework.get_or_create_global_step()
#use the default graph in the process in the context
#global_step=tf.Variable(0,name='global_step',trainable=False)
#獲取影象和標籤
images,labels=model_build.distorted_inputs()
#建立一個圖來計算神經元預測值,前向傳播
logits=model_build.inference(images)
#計算loss
loss=model_build.loss(logits,labels)
#建一個圖來來訓練一個Batch的樣本然後更新引數
train_op=model_build.train(loss,global_step)
#專門定義_LoggerHook類,在mon_sess這個對話中註冊
class _LoggerHook(tf.train.SessionRunHook):
"""
Logs loss and runtime.
"""
def begin(self):
self._step=-1
self._start_time=time.time()
def before_run(self,run_context):
#Called before each call to run()
#返回‘SessionRunArgs’物件意味著ops或者tensors去加入即將到來的run(),
#這些ops和tensor回合之前的一起送入run()
#run()的引數裡還可以包括你要feed的東西
#run_context引數包括了即將到來的run()的資訊:原始的op和tensors
#當該函式執行完,圖就確定了,就不能再加op了
self._step+=1
return tf.train.SessionRunArgs(loss) #Asks for loss value
def after_run(self,run_context,run_values):
#Called after eah call to run()
#'run value' argument contains results of requested ops/tensors by'before_run'
#the 'run_context' argument 與送入before_run的是一樣的
#'run_context.request_stop()'can be called to stop the iteration
if self._step % FLAGS.log_frequency==0:#當取了FLAGS.log_frequency個batches的時候
current_time=time.time()
duration=current_time-self._start_time
self._start_time=current_time
loss_value=run_values.results
examples_per_sec=FLAGS.log_frequency* FLAGS.batch_size/duration
sec_per_barch=float(duration/FLAGS.log_frequency)
format_str=('%s:step %d,loss=%.2f (%.1f examples/sec; %.3f' 'sec/batch')
print(format_str %(datetime.now(),self._step,loss_value,examples_per_sec,sec_per_barch))
with tf.train.MonitoredTrainingSession(
#set proper session intializer/restorer,it also creates hooks related to
#checkpoint and summary saving
checkpoint_dir=FLAGS.train_dir,
hooks=[tf.train.StopAtStepHook(last_step=FLAGS.max_steps),tf.train.NanTensorHook(loss),
_LoggerHook()],
config=tf.ConfigProto(
log_device_placement=FLAGS.log_device_placement)) as mon_sess:
while not mon_sess.should_stop():
mon_sess.run(train_op)
#此處表示在停止條件到達之前,迴圈執行train_op,更新模型引數
def main(argv=None):
train1()
if __name__=='__main__':
tf.app.run(main=main)
先用with tf.Graph().as_default()使得所有操作在預設圖下,用with框起來表明以下所有ops都被加到這個圖中 如果你想建立一個新的執行緒,然後希望把這個新的執行緒新增到這個圖中,那麼你必須新增上“with g.as_default()“ 然後呼叫model_build裡面的輸入函式,inferencce函式還有loss函式,train函式,最終得到了train_op 然後定義了一個_LoggerHook()類物件,繼承了tf.train.SeesionRunHook類 【補9】tf.train.SeesionRunHook類 我大概理解就是這是一個會話懸停物件等待MonitoredSession.run()來執行它 這個類有如下方法:
- after_create_session(self,session,coord)
session: A TensorFlow Session that has been created.
coord: A Coordinator object which keeps track of all threads.
- after_run(self, run_context, run_values)
run_context: A `SessionRunContext` object.
run_values: A SessionRunValues object. 這裡if條件當執行步數為FLAGS.log_frequency(10)的整數倍時,即每處理10個batches記錄當前時間,算出消耗的時間,因為在before_run()裡我們把loss加入到session.run()中,所以此處run_value.results就是loss的值,輸出結果
- before_run(self, run_context)
- begin(self)
建立一個MonitoredSession for training 對於chief,這個可以設定合適的session初始化/恢復器,它還可以建立與checkpoint和summary saving有關的hooks 對於worker,要等待chief去初始化/恢復session Args:
master: `String` the TensorFlow master to use. is_chief: If `True`,它會負責初始化、恢復正在進行的session, If `False`,它會等待chief去初始化或恢復session checkpoint_dir: A string. 儲存checkpoint的路徑
scaffold: A `Scaffold` used for gathering or building supportive ops. If not specified, a default one is created. It's used to finalize the graph.
hooks: Optional list of `SessionRunHook` objects. 此處設為[tf.train.StopAtStepHook(last_step=FLAGS.max_step),tf.train.NanTensorHook(loss),_LoggerHook()]
- tf.train.StopAtStepHook()為會話懸停類物件,它的作用是監視並提出停在特定步驟的請求
- tf.train.NanTensorHook()會話懸停類物件
- _LoggerHook()就是我們自己定義的會話懸停物件用來執行loss的計算,時間的記錄,列印等等
chief_only_hooks: list of `SessionRunHook` objects. Activate these hooks if `is_chief==True`, ignore otherwise.
save_checkpoint_secs: checkpoint儲存頻率,如果設為None,就不儲存
save_summaries_steps: summary寫入頻率
config: an instance of `tf.ConfigProto` proto used to configure the session. It's the `config` argument of constructor of `tf.Session`.
Returns:
A `MonitoredSession` object.
最後輸出結果:
Filling queue with 20000 CIFAR images before starting to train.This will take a few minutes.
2017-04-16 20:04:10.826531:step 0,loss=6.39 (25.3 examples/sec; 5.056sec/batch
2017-04-16 20:04:36.614833:step 10,loss=6.22 (49.6 examples/sec; 2.579sec/batch
2017-04-16 20:05:01.745663:step 20,loss=6.10 (50.9 examples/sec; 2.513sec/batch
2017-04-16 20:05:27.068144:step 30,loss=6.01 (50.5 examples/sec; 2.532sec/batch
因為我使用CPU,所以跟官網指南上給的GPU版本速度差別很大