Tensorflow系列:tf.contrib.layers.batch_norm
tf.contrib.layers.batch_norm(
inputs,
decay=0.999,
center=True,
scale=False,
epsilon=0.001,
activation_fn=None,
param_initializers=None,
param_regularizers=None,
updates_collections=tf.GraphKeys.UPDATE_OPS,
is_training=True,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
batch_weights=None,
fused=None,
data_format=DATA_FORMAT_NHWC,
zero_debias_moving_mean=False,
scope=None,
renorm=False,
renorm_clipping=None,
renorm_decay=0.99,
adjustment=None
)
"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift"
Sergey Ioffe, Christian Szegedy
Batch Normalization通過減少內部協變數加速神經網路的訓練。
可以用作conv2d和fully_connected的標準化函式。
引數:
1 inputs: 輸入
2 decay :衰減係數。合適的衰減係數值接近1.0,特別是含多個9的值:0.999,0.99,0.9。如果訓練集表現很好而驗證/測試集表現得不好,選擇
小的係數(推薦使用0.9)。如果想要提高穩定性,zero_debias_moving_mean設為True
3 center:如果為True,有beta偏移量;如果為False,無beta偏移量
4 scale:如果為True,則乘以gamma。如果為False,gamma則不使用。當下一層是線性的時(例如nn.relu),由於縮放可以由下一層完成,
所以可以禁用該層。
5 epsilon:避免被零除
6 activation_fn:用於啟用,預設為線性啟用函式
7 param_initializers : beta, gamma, moving mean and moving variance的優化初始化
8 param_regularizers : beta and gamma正則化優化
9 updates_collections :Collections來收集計算的更新操作。updates_ops需要使用train_op來執行。如果為None,則會新增控制元件依賴項以
確保更新已計算到位。
10 is_training:圖層是否處於訓練模式。在訓練模式下,它將積累轉入的統計量moving_mean並 moving_variance使用給定的指數移動平均值 decay。當它不是在訓練模式,那麼它將使用的數值moving_mean和moving_variance。
11 scope:可選範圍variable_scope
注意:訓練時,需要更新moving_mean和moving_variance。預設情況下,更新操作被放入
tf.GraphKeys.UPDATE_OPS,所以需要新增它們作為依賴項train_op
。例如:
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_op = optimizer.minimize(loss)
可以將updates_collections = None設定為強制更新,但可能會導致速度損失,尤其是在分散式設定中。
返回 該操作的輸出
API:https://tensorflow.google.cn/api_docs/python/tf/contrib/layers/batch_norm
minist例子:
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# define our typical fully-connected + batch normalization + nonlinearity set-up
def dense(x, size, scope):
return tf.contrib.layers.fully_connected(x, size,
activation_fn=None,
scope=scope)
def dense_batch_relu(x, phase, scope):
with tf.variable_scope(scope):
h1 = tf.contrib.layers.fully_connected(x, 100,
activation_fn=None,
scope='dense')
h2 = tf.contrib.layers.batch_norm(h1,
center=True, scale=True,
is_training=phase,
scope='bn')
return tf.nn.relu(h2, 'relu')
tf.reset_default_graph()
x = tf.placeholder('float32', (None, 784), name='x')
y = tf.placeholder('float32', (None, 10), name='y')
phase = tf.placeholder(tf.bool, name='phase')
h1 = dense_batch_relu(x, phase,'layer1')
h2 = dense_batch_relu(h1, phase, 'layer2')
logits = dense(h2, 10, 'logits')
with tf.name_scope('accuracy'):
accuracy = tf.reduce_mean(tf.cast(
tf.equal(tf.argmax(y, 1), tf.argmax(logits, 1)),
'float32'))
with tf.name_scope('loss'):
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y))
def train(mnist):
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
# Ensures that we execute the update_ops before performing the train_step
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
history = []
iterep = 500
for i in range(iterep * 30):
x_train, y_train = mnist.train.next_batch(100)
sess.run(train_step,
feed_dict={'x:0': x_train,
'y:0': y_train,
'phase:0': 1})
if (i + 1) % iterep == 0:
epoch = (i + 1)/iterep
tr = sess.run([loss, accuracy],
feed_dict={'x:0': mnist.train.images,
'y:0': mnist.train.labels,
'phase:0': 1})
t = sess.run([loss, accuracy],
feed_dict={'x:0': mnist.test.images,
'y:0': mnist.test.labels,
'phase:0': 0})
history += [[epoch] + tr + t]
print(history[-1])
return history
def main(argv=None):
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
train(mnist)
if __name__ == '__main__':
tf.app.run()