1. 程式人生 > >訓練過程中使用學習率衰減

訓練過程中使用學習率衰減

隨機梯度下降演算法的效能與學習率有著直接的關係,這是因為學習率決定了引數移動到最優值時的速度。如果學習率過大很可能會越過最優值,如果學習率過小,優化的效率可能過低,收斂時間極長。那麼一個很好的解決方案就是學習率衰減——即學習率隨著訓練的進行逐漸衰減。

在訓練過程開始時,使用較大的學習率,這樣就能快速收斂;隨著訓練過程的進行,逐漸降低學習率,這樣有助於找到最優解。

目前兩種較為流行的學習率衰減方法為:(1)線性衰減        (2)指數衰減

(一)學習率線性衰減:

根據epoch逐步降低學習率。

在Keras中是通過SGD類中的隨機梯度下降優化演算法實現的,這個類有一個decay衰減率引數。

decay=0時,對學習率沒有影響,非零時,學習率呈線性衰減。

公式為:

LearningRate = LearningRate \ast \frac{1}{1+decay\ast epoch}

 在下面程式碼中,初始學習率設為0.1——這是一個較為高的值。decay設為0.005。

"""
學習率線性衰減
"""
from sklearn import datasets
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import SGD

#匯入資料
dataset = datasets.load_iris()
x=dataset.data
Y=dataset.target
#隨機種子
seed=7
np.random.seed(seed)

#構建模型函式
def create_model(init='glorot_uniform'):
    #構建模型
    model = Sequential()
    model.add(Dense(units=4, activation='relu', input_dim=4, kernel_initializer=init))
    model.add(Dense(units=6, activation='relu', kernel_initializer=init))
    model.add(Dense(units=3, activation='softmax', kernel_initializer=init))

    #模型優化
    learningrate = 0.1
    momentum = 0.9
    dacay_rate = 0.005
    #定義學習率衰減
    sgd = SGD(lr=learningrate, momentum=momentum, decay=dacay_rate, nesterov=False)
    #編譯模型
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    return model

epochs = 200
model = KerasClassifier(build_fn=create_model, epochs=epochs, batch_size=5, verbose=1)
model.fit(x, Y)

結果為:

 

Epoch 1/200
2018-11-05 15:05:48.177490: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
2018-11-05 15:05:48.179412: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 8. Tune using inter_op_parallelism_threads for best performance.

  5/150 [>.............................] - ETA: 7s - loss: 1.0809 - acc: 0.4000
150/150 [==============================] - 0s 2ms/step - loss: 0.7938 - acc: 0.5800
Epoch 2/200

  5/150 [>.............................] - ETA: 0s - loss: 0.2944 - acc: 0.8000
150/150 [==============================] - 0s 203us/step - loss: 0.4864 - acc: 0.6667
Epoch 3/200

  5/150 [>.............................] - ETA: 0s - loss: 0.2835 - acc: 0.8000
150/150 [==============================] - 0s 213us/step - loss: 0.4922 - acc: 0.6533
Epoch 4/200

  5/150 [>.............................] - ETA: 0s - loss: 0.5734 - acc: 0.6000
150/150 [==============================] - 0s 210us/step - loss: 0.4693 - acc: 0.7000

。。。。。。
Epoch 199/200

  5/150 [>.............................] - ETA: 0s - loss: 0.4183 - acc: 0.6000
150/150 [==============================] - 0s 200us/step - loss: 0.4632 - acc: 0.6400
Epoch 200/200

  5/150 [>.............................] - ETA: 0s - loss: 0.5556 - acc: 0.6000
150/150 [==============================] - 0s 247us/step - loss: 0.4639 - acc: 0.6333

 (二)學習率指數衰減:

這種方法通常是通過在固定的epoch週期將學習速率降低50%來實現的。

在Keras中,使用LearningRateScheduler回撥,來實現學習率的指數衰減。函式將epoch數值作為一個引數,並將學習率返回到隨機梯度下降演算法中使用。

"""
學習率指數級衰減
"""
from sklearn import datasets
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.optimizers import SGD
from keras.callbacks import LearningRateScheduler
from math import pow, floor

#匯入資料
dataset = datasets.load_iris()
x=dataset.data
Y=dataset.target
#隨機種子
seed=7
np.random.seed(seed)

#計算學習率
def step_decay(epoch):
    init_lrate = 0.1#初始學習率定為0.1(較高)
    drop = 0.5#學習率降低50%
    epochs_drop = 10#沒10個epochs降低一次
    lrate = init_lrate * pow(drop, floor(1 + epoch) / epochs_drop)
    return lrate

#構建模型函式
def create_model(init='glorot_uniform'):
    #構建模型
    model = Sequential()
    model.add(Dense(units=4, activation='relu', input_dim=4, kernel_initializer=init))
    model.add(Dense(units=6, activation='relu', kernel_initializer=init))
    model.add(Dense(units=3, activation='softmax', kernel_initializer=init))
    # 模型優化
    learningrate = 0.1
    momentum = 0.9
    dacay_rate = 0.0
    # 定義學習率衰減
    sgd = SGD(lr=learningrate, momentum=momentum, decay=dacay_rate, nesterov=False)
    # 編譯模型
    model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
    return model

#學習率指數衰減回撥
lrate = LearningRateScheduler(step_decay)

epochs = 200
model = KerasClassifier(build_fn=create_model, epochs=epochs, batch_size=5, verbose=1, callbacks=[lrate])
model.fit(x,Y)

。。。。

Epoch 197/200

  5/150 [>.............................] - ETA: 0s - loss: 1.0988 - acc: 0.0000e+00
150/150 [==============================] - 0s 207us/step - loss: 1.0986 - acc: 0.3333
Epoch 198/200

  5/150 [>.............................] - ETA: 0s - loss: 1.0985 - acc: 0.4000
150/150 [==============================] - 0s 203us/step - loss: 1.0986 - acc: 0.3333
Epoch 199/200

  5/150 [>.............................] - ETA: 0s - loss: 1.0986 - acc: 0.2000
150/150 [==============================] - 0s 200us/step - loss: 1.0986 - acc: 0.3333
Epoch 200/200

  5/150 [>.............................] - ETA: 0s - loss: 1.0986 - acc: 0.4000
150/150 [==============================] - 0s 203us/step - loss: 1.0986 - acc: 0.3333