訓練過程中使用學習率衰減
隨機梯度下降演算法的效能與學習率有著直接的關係,這是因為學習率決定了引數移動到最優值時的速度。如果學習率過大很可能會越過最優值,如果學習率過小,優化的效率可能過低,收斂時間極長。那麼一個很好的解決方案就是學習率衰減——即學習率隨著訓練的進行逐漸衰減。
在訓練過程開始時,使用較大的學習率,這樣就能快速收斂;隨著訓練過程的進行,逐漸降低學習率,這樣有助於找到最優解。
目前兩種較為流行的學習率衰減方法為:(1)線性衰減 (2)指數衰減
(一)學習率線性衰減:
根據epoch逐步降低學習率。
在Keras中是通過SGD類中的隨機梯度下降優化演算法實現的,這個類有一個decay衰減率引數。
decay=0時,對學習率沒有影響,非零時,學習率呈線性衰減。
公式為:
在下面程式碼中,初始學習率設為0.1——這是一個較為高的值。decay設為0.005。
""" 學習率線性衰減 """ from sklearn import datasets import numpy as np from keras.models import Sequential from keras.layers import Dense from keras.wrappers.scikit_learn import KerasClassifier from keras.optimizers import SGD #匯入資料 dataset = datasets.load_iris() x=dataset.data Y=dataset.target #隨機種子 seed=7 np.random.seed(seed) #構建模型函式 def create_model(init='glorot_uniform'): #構建模型 model = Sequential() model.add(Dense(units=4, activation='relu', input_dim=4, kernel_initializer=init)) model.add(Dense(units=6, activation='relu', kernel_initializer=init)) model.add(Dense(units=3, activation='softmax', kernel_initializer=init)) #模型優化 learningrate = 0.1 momentum = 0.9 dacay_rate = 0.005 #定義學習率衰減 sgd = SGD(lr=learningrate, momentum=momentum, decay=dacay_rate, nesterov=False) #編譯模型 model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy']) return model epochs = 200 model = KerasClassifier(build_fn=create_model, epochs=epochs, batch_size=5, verbose=1) model.fit(x, Y)
結果為:
Epoch 1/200
2018-11-05 15:05:48.177490: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-11-05 15:05:48.179412: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.5/150 [>.............................] - ETA: 7s - loss: 1.0809 - acc: 0.4000
150/150 [==============================] - 0s 2ms/step - loss: 0.7938 - acc: 0.5800
Epoch 2/2005/150 [>.............................] - ETA: 0s - loss: 0.2944 - acc: 0.8000
150/150 [==============================] - 0s 203us/step - loss: 0.4864 - acc: 0.6667
Epoch 3/2005/150 [>.............................] - ETA: 0s - loss: 0.2835 - acc: 0.8000
150/150 [==============================] - 0s 213us/step - loss: 0.4922 - acc: 0.6533
Epoch 4/2005/150 [>.............................] - ETA: 0s - loss: 0.5734 - acc: 0.6000
150/150 [==============================] - 0s 210us/step - loss: 0.4693 - acc: 0.7000。。。。。。
Epoch 199/2005/150 [>.............................] - ETA: 0s - loss: 0.4183 - acc: 0.6000
150/150 [==============================] - 0s 200us/step - loss: 0.4632 - acc: 0.6400
Epoch 200/2005/150 [>.............................] - ETA: 0s - loss: 0.5556 - acc: 0.6000
150/150 [==============================] - 0s 247us/step - loss: 0.4639 - acc: 0.6333
(二)學習率指數衰減:
這種方法通常是通過在固定的epoch週期將學習速率降低50%來實現的。
在Keras中,使用LearningRateScheduler回撥,來實現學習率的指數衰減。函式將epoch數值作為一個引數,並將學習率返回到隨機梯度下降演算法中使用。
"""
學習率指數級衰減
"""
from sklearn import datasets
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import SGD
from keras.callbacks import LearningRateScheduler
from math import pow, floor
#匯入資料
dataset = datasets.load_iris()
x=dataset.data
Y=dataset.target
#隨機種子
seed=7
np.random.seed(seed)
#計算學習率
def step_decay(epoch):
init_lrate = 0.1#初始學習率定為0.1(較高)
drop = 0.5#學習率降低50%
epochs_drop = 10#沒10個epochs降低一次
lrate = init_lrate * pow(drop, floor(1 + epoch) / epochs_drop)
return lrate
#構建模型函式
def create_model(init='glorot_uniform'):
#構建模型
model = Sequential()
model.add(Dense(units=4, activation='relu', input_dim=4, kernel_initializer=init))
model.add(Dense(units=6, activation='relu', kernel_initializer=init))
model.add(Dense(units=3, activation='softmax', kernel_initializer=init))
# 模型優化
learningrate = 0.1
momentum = 0.9
dacay_rate = 0.0
# 定義學習率衰減
sgd = SGD(lr=learningrate, momentum=momentum, decay=dacay_rate, nesterov=False)
# 編譯模型
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
return model
#學習率指數衰減回撥
lrate = LearningRateScheduler(step_decay)
epochs = 200
model = KerasClassifier(build_fn=create_model, epochs=epochs, batch_size=5, verbose=1, callbacks=[lrate])
model.fit(x,Y)
。。。。
Epoch 197/200
5/150 [>.............................] - ETA: 0s - loss: 1.0988 - acc: 0.0000e+00
150/150 [==============================] - 0s 207us/step - loss: 1.0986 - acc: 0.3333
Epoch 198/2005/150 [>.............................] - ETA: 0s - loss: 1.0985 - acc: 0.4000
150/150 [==============================] - 0s 203us/step - loss: 1.0986 - acc: 0.3333
Epoch 199/2005/150 [>.............................] - ETA: 0s - loss: 1.0986 - acc: 0.2000
150/150 [==============================] - 0s 200us/step - loss: 1.0986 - acc: 0.3333
Epoch 200/2005/150 [>.............................] - ETA: 0s - loss: 1.0986 - acc: 0.4000
150/150 [==============================] - 0s 203us/step - loss: 1.0986 - acc: 0.3333