Coursera 吳恩達 Deep Learning 第2課 Improving Deep Neural Networks 第一週 程式設計作業程式碼 Initialization
阿新 • • 發佈:2019-01-31
2 - Zero initialization
# GRADED FUNCTION: initialize_parameters_zeros def initialize_parameters_zeros(layers_dims): """ Arguments: layer_dims -- python array (list) containing the size of each layer. Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":3 - Random initialization
# GRADED FUNCTION: initialize_parameters_random def initialize_parameters_random(layers_dims): """ Arguments: layer_dims -- python array (list) containing the size of each layer. Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) b1 -- bias vector of shape (layers_dims[1], 1) ... WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) bL -- bias vector of shape (layers_dims[L], 1) """ np.random.seed(3) # This seed makes sure your "random" numbers will be the as ours parameters = {} L = len(layers_dims) # integer representing the number of layers for l in range(1, L): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * 10 #注意括號的數目 parameters['b' + str(l)] = np.zeros((layers_dims[l], 1)) ### END CODE HERE ### return parameters4 - He initialization
Xavier初始化的基本思想是保持輸入和輸出的方差一致,這樣就避免了所有輸出值都趨向於0 He initialization的思想是:在ReLU網路中,假定每一層有一半的神經元被啟用,另一半為0,所以,要保持variance不變,只需要在Xavier的基礎上再除以2 # GRADED FUNCTION: initialize_parameters_he def initialize_parameters_he(layers_dims): """ Arguments: layer_dims -- python array (list) containing the size of each layer. Returns: parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL": W1 -- weight matrix of shape (layers_dims[1], layers_dims[0]) b1 -- bias vector of shape (layers_dims[1], 1) ... WL -- weight matrix of shape (layers_dims[L], layers_dims[L-1]) bL -- bias vector of shape (layers_dims[L], 1) """ np.random.seed(3) parameters = {} L = len(layers_dims) - 1 # integer representing the number of layers for l in range(1, L + 1): ### START CODE HERE ### (≈ 2 lines of code) parameters['W' + str(l)] = np.random.randn(layers_dims[l], layers_dims[l-1]) * np.sqrt(2./layers_dims[l-1]) parameters['b' + str(l)] = np.zeros((layers_dims[l], 1)) ### END CODE HERE ### return parameters疑問:
If you have heard of "Xavier initialization", this is similar except Xavier initialization uses a scaling factor for the weightsW[l]of sqrt(1./layers_dims[l-1]) 實驗中提到的 Xarier 初始化 分佈為 正態分佈隨機化後 除以 “sqrt(上一層結點數目)” 然而在 論文中,Xavier 初始化 的分佈為均勻分佈:in TensorFlow defxavier_init(fan_in,fan_out,constant = 1): low = -constant * np.sqrt(6.0/ (fan_in + fan_out)) high = constant * np.sqrt(6.0/ (fan_in + fan_out)) return tf.random_uniform((fan_in,fan_out), minval=low,maxval=high,dtype=tf.float32) [1] Xavier Glorot et al., Understanding the Difficult of Training Deep Feedforward Neural Networks.