深層神經網路引數初始化方式對訓練精度的影響
本文是基於吳恩達老師《深度學習》第二週第一課練習題所做,目的在於探究引數初始化對模型精度的影響。
文中所用到的輔助程式在這裡。
一、資料處理
本文所用第三方庫如下,其中init_utils為輔助程式包含構建神經網路的函式。
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
from init_utils import *
通過init_utils中的load_dataset函式,可以直觀看到本文所用到的資料樣貌:
二、模型搭建plt.rcParams['figure.figsize'] = (7.0, 4.0) plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray' train_X, train_Y, test_X, test_Y = load_dataset()
本文所搭建的神經網路模型與上一篇文章一致,不再贅述。
def model(X, Y, learning_rate =0.01, num_iterations = 15000, print_cost = True,intialization = "he"): grads = {} costs = [] m = X.shape[1] layers_dims = [X.shape[0],10, 5, 1] if intialization == "zeros": parameters = initialize_parameters_zeros(layers_dims) elif intialization == "random": parameters = initialize_parameters_random(layers_dims) elif intialization == "he": parameters = initialize_parameters_he(layers_dims) for i in range(0, num_iterations): a3, cache = forward_propagation(X, parameters) cost = compute_loss(a3, Y) grads = backward_propagation(X, Y, cache) parameters = update_parameters(parameters, grads, learning_rate) if print_cost and i % 100 == 0: print("cost after iterations {}:{}".format(i, cost)) costs.append(cost) plt.plot(costs) plt.xlabel("iteration (per hundreds)") plt.ylabel("cost") plt.title("Learning rate =" + str(learning_rate)) plt.show() return parameters
三、零矩陣初始化
def initialize_parameters_zeros(layers_dims): parameters = {} L = len(layers_dims) for l in range(1,L): parameters["W" + str(l)] = np.zeros((layers_dims[l],layers_dims[l-1])) parameters["b" + str(l)] = np.zeros((layers_dims[l],1)) return parameters
如果我們使用零矩陣(np.zeros())來初始W,在之後的一系列運算中所得到的W[l]都將為0,我們來測試下這樣會得到怎麼的訓練結果。
parameters = model(train_X, train_Y, intialization = "zeros")
print("on the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("on the test set:")
predictions_test = predict(test_X, test_Y, parameters)
執行過程中我們會發現cost的計算速度很快,這也是因為使用零矩陣進行初始化的結果。
cost after iterations 0:0.6931471805599453
cost after iterations 1000:0.6931471805599453
cost after iterations 2000:0.6931471805599453
cost after iterations 3000:0.6931471805599453
cost after iterations 4000:0.6931471805599453
cost after iterations 5000:0.6931471805599453
cost after iterations 6000:0.6931471805599453
cost after iterations 7000:0.6931471805599453
cost after iterations 8000:0.6931471805599453
cost after iterations 9000:0.6931471805599453
cost after iterations 10000:0.6931471805599455
cost after iterations 11000:0.6931471805599453
cost after iterations 12000:0.6931471805599453
cost after iterations 13000:0.6931471805599453
cost after iterations 14000:0.6931471805599453
on the train set:
Accuracy: 0.5
on the test set:
Accuracy: 0.5
使用該模型對樣本進行分類,結果如下圖,把整個資料集都預測為0,劃分為一類。
有此可知:(1)b[l]可以初始化為0;(2)W[l]必須隨即初始化。
四、隨即初始化
def initialize_parameters_random(layers_dims):
np.random.seed(3)
parameters = {}
L = len(layers_dims)
for l in range(1,L):
parameters["W" + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * 10
parameters["b" + str(l)] = np.zeros((layers_dims[l],1))
return parameters
使用np.random.randn()函式進行初始化,且在此處測試中我們給參W[l]設定一個較大的係數,如:10。我們來看一下執行結果。
parameters = model(train_X, train_Y, intialization = "random")
cost after iterations 0:inf
cost after iterations 1000:0.6239567039908781
cost after iterations 2000:0.5978043872838292
cost after iterations 3000:0.563595830364618
cost after iterations 4000:0.5500816882570866
cost after iterations 5000:0.5443417928662615
cost after iterations 6000:0.5373553777823036
cost after iterations 7000:0.4700141958024487
cost after iterations 8000:0.3976617665785177
cost after iterations 9000:0.39344405717719166
cost after iterations 10000:0.39201765232720626
cost after iterations 11000:0.38910685278803786
cost after iterations 12000:0.38612995897697244
cost after iterations 13000:0.3849735792031832
cost after iterations 14000:0.38275100578285265
on the train set:
Accuracy: 0.83
on the test set:
Accuracy: 0.86
訓練精度明顯高很多,分類結果如下:
plt.title("Model with random initialization")
axes = plt.gca()
axes.set_xlim([-1.5, 1.5])
axes.set_ylim([-1.5, 1.5])
plot_decision_boundary(lambda x:predict_dec(parameters, x.T), train_X, train_Y)
從執行結果中我們可以發現,使用較大權重隨即初始化的效果雖然比零初始化效果好,但是還是有大片區域預測錯誤,那麼下面我們使用較小的權重進行實驗。
parameters["W" + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * 0.01
cost after iterations 0:0.6931473549048267
cost after iterations 1000:0.6931473504885134
cost after iterations 2000:0.6931473468317759
cost after iterations 3000:0.6931473432446151
cost after iterations 4000:0.6931473396813665
cost after iterations 5000:0.6931473361903396
cost after iterations 6000:0.6931473327310922
cost after iterations 7000:0.6931473293491259
cost after iterations 8000:0.6931473260187053
cost after iterations 9000:0.6931473227372426
cost after iterations 10000:0.6931473195009528
cost after iterations 11000:0.6931473163278133
cost after iterations 12000:0.693147312959552
cost after iterations 13000:0.6931473097541097
cost after iterations 14000:0.6931473065831708
on the train set:
Accuracy: 0.4633333333333333
on the test set:
Accuracy: 0.48
不難發現,訓練結果更差,那麼到底使用怎麼的初始權重菜才能收到較好的訓練效果呢?
五、he初始化
該方法的思路解決梯度消失或梯度爆炸問題,在W初始化時引入係數np.sqrt(2./layers_dims[l-1])。
注:對於relu啟用函式,引入係數np.sqrt(2./layers_dims[l-1]);對於tanh啟用函式,引入係數np.sqrt(1./layers_dims[l-1])
def initialize_parameters_he(layers_dims):
np.random.seed(3)
parameters = {}
L = len(layers_dims)
for l in range(1,L):
parameters["W" + str(l)] = np.random.randn(layers_dims[l],layers_dims[l-1]) * np.sqrt(2./layers_dims[l-1])
parameters["b" + str(l)] = np.zeros((layers_dims[l],1))
return parameters
我們來看一下執行結果。
parameters = model(train_X, train_Y, intialization = "he")
parameters = model(train_X, train_Y, intialization = "he")
print("on the train set:")
predictions_train = predict(train_X, train_Y, parameters)
print("on the test set:")
predictions_test = predict(test_X, test_Y, parameters)
cost after iterations 0:0.8830537463419761
cost after iterations 1000:0.6879825919728063
cost after iterations 2000:0.6751286264523371
cost after iterations 3000:0.6526117768893807
cost after iterations 4000:0.6082958970572938
cost after iterations 5000:0.5304944491717495
cost after iterations 6000:0.4138645817071794
cost after iterations 7000:0.3117803464844441
cost after iterations 8000:0.23696215330322562
cost after iterations 9000:0.18597287209206836
cost after iterations 10000:0.15015556280371817
cost after iterations 11000:0.12325079292273552
cost after iterations 12000:0.09917746546525932
cost after iterations 13000:0.08457055954024274
cost after iterations 14000:0.07357895962677362
on the train set:
Accuracy: 0.9933333333333333
on the test set:
Accuracy: 0.96
從訓練結果可以看出cost的下降速錄比較快,預測精度的非常高。
六、總結
1. 不同的初始化方式會導致完全不同的測試效果
2.過大、過小較大的權重進行初始化,執行效果都不理想
3.根據啟用函式的不同,選擇適當的he係數