Batch Normalization理論基礎以及tensorflow實現
Batch Normalization 理論
Batch Normalization 相當於歸一化輸出的feature map。理論基礎首先在Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 文中提出,用於消除深度學習中臭名昭著的梯度消失和梯度爆炸的問題。batch normalization 通常用於卷積層的後面,用於啟用層的前面。
其理論如下:
其中上述黑箱Batch Normalization 實現的具體步驟如下:
• μB is the empirical mean, evaluated over the whole mini-batch B.
• σB is the empirical standard deviation, also evaluated over the whole mini-batch.
• mB is the number of instances in the mini-batch
•
is the zero-centered and normalized input.
• γ is the scaling parameter for the layer.
• β is the shifting parameter (offset) for the layer.
• ϵ is a tiny number to avoid division by zero (typically 10–3). This is called a
smoothing term.
• z(i) is the output of the BN operation: it is a scaled and shifted version of the inputs.[^1]
Batch Normalization在tensorflow上的實現
tensorflow關於batch normalization 的函式如下所示
tf.layers.batch_normalization(inputs, axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer=<tensorflow.python.ops.init_ops.Zeros object at 0x000000000AC16A20>, gamma_initializer=<tensorflow.python.ops.init_ops.Ones object at 0x000000000AC16A58>, moving_mean_initializer=<tensorflow.python.ops.init_ops.Zeros object at 0x000000000AC16A90>, moving_variance_initializer=<tensorflow.python.ops.init_ops.Ones object at 0x000000000AC16AC8>, beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, training=False, trainable=True, name=None, reuse=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99, fused=None, virtual_batch_size=None, adjustment=None)
inputs: 輸入,為必選的引數
training: 是否訓練,一般也都要給入False or True
beta_initializer: 初始化上述理論中提到的
, 只有在scale=True下起作用
gamma_initializer: 初始化上述l理論中提到的
,只有在scale=True下起作用
moving_mean_initializer: 初始化上述提到的均值
moving_variance_initializer: 初始化上述提到的方差
batch normalization 封裝函式
def batch_normalization(inputs, is_training):
moving_var = tf.constant_initializer(0.01)
output = tf.layers.batch_normalization(inputs, moving_variance_initializer=moving_var, training=is_training)
return output
batch normalization 訓練注意事項
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
update_weight= tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
在訓練時通常常見上述兩種操作,兩者的詳細區別見上條部落格。簡單來說就是第一行語句是為了獲取需要更新的操作,比如 batch normalization的均值和方差的更新;第二行代表獲取需要更新的權重等變數,比如weight,bias。因此,一旦使用了batch normalization就一定要在第一行程式碼的條件下進行訓練操作。至於第二條語句,一般在需要指定某些層訓練的時候使用,也就是用於凍結部分層,不指定就是全部的變數都進行訓練。
訓練時的程式碼如下:
# 直接使用trian_op進行訓練即可
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
train_op1 = optimizer.minimize(loss)
參考文獻
1.[^1] Aurélien Géron. Hands-On Machine Learning with Scikit-Learn and TensorFlow. 2017