使用python實現帶回溯的梯度下降
Gradient-based Method
在優化的領域裡,gradient-based method表示每次迭代直接使用目標函式的gradient的相反方向作為下降的方向。於是每次迭代的更新方式可表示為:
其中
Steepest Descent with Exact Line Search
Steepest descent是一種使用gradient-based method時可以最快收斂的方法。其演算法原理是在每次迭代時為了確定最優的步長,把目標函式轉換為以為變數的函式並求其最優解,即在每次迭代時求最優解,這一步在優化領域統稱為Exact Line Search可用於不同的迭代優化演算法。因為每次迭代
Backtracking Method
為了簡便且有效的替代Exact Line Search,Backtracking演算法以固定的速率減小直到目標函式滿足預設的sufficient decrease condition。其演算法如下:
對於每次迭代:
step 0: 使用預設的步長
step 1: while sufficient decrease condition is not met:
=
step 2: 使用當前的計算下次迭代的值
其中的sufficient decrease condition通常使用:,其中為預設的引數
該方法雖然不能保證每次迭代都能得到當前最優的值,但可以保證每次迭代目標函式都能得到足夠的下降,而這一點就是由sufficient decrease condition來控制的。
使用python實現Backtracking Gradient-descent
下面展示使用python實現Backtracking gradient-descent的程式碼,因在實際計算過程中由於舍入誤差演算法不能保證收斂到最優解,程式碼中設定了停止條件為迭代次數達到1000次或當前gradient足夠小:
def backtracking(function,gradient,initial_s,gamma,beta,initial_x):
"""
Description:
Gradient-based method with backtracking to
find the minimum of a multivariant function
Parameters:
function: objective function to be minimized
gradient: gradient of objective function
initial_s: initial guess of step length
gamma: real number between 0 and 1 to
test if the function is decreased sufficiently
beta: real number between 0 and 1 to decrease the
step length if the original one does not make
sufficient decrease
initial_x: initial point
"""
# import library
import numpy as np
# set maximum number of iteration
max_iter=1000
# set stop criteria
stop_criteria=1E-5
# print initial point
print('Initial point: {}.T'.format(initial_x.reshape(1,-1)))
# set x to initial point
x_k=initial_x
# set s to initial s
s=initial_s
# create lists to store result
d_k_list=[]
s_list=[]
x_k_list=[]
# iterate until maximum iteration is reached
for iter in range(max_iter):
# calcualte gradient value
gradient_k=gradient(x_k)
# calcualte function value
function_k=function(x_k)
# set s to initial s
s=initial_s
# check stop criteria
if ((np.linalg.norm(gradient_k)/(1+np.abs(function_k))) > stop_criteria):
# descent direction
d_k=-gradient_k
# next point
x_k_plus_one=x_k+s*d_k
# check if the sufficient decrease condition is met
while (function(x_k) - function(x_k-s*gradient(x_k))) < (-gamma*s*np.dot(gradient_k.T,d_k)):
# update step length
s=beta*s
# re-calculate next point
x_k_plus_one=x_k+s*d_k
# update x
x_k=x_k_plus_one
# put result in lists
d_k_list.append(d_k)
s_list.append(s)
x_k_list.append(x_k)
# if the stop criteria is satisfied
else:
# print out the solution
print('Solution found: {}.T'.format(x_k.reshape(1,-1)))
# break the iteration
break
# print result
# if iteration less than 15 print each iteration
if iter < 15:
for i in range(iter):
print('=====Iteration{}====='.format(i+1))
# print search direction
print('Search direction: {}.T'.format(d_k_list[i].reshape(1,-1)))
# print step length
print('Step length: {}'.format(s_list[i]))
# print new iterate
print('New iterate: {}.T'.format(x_k_list[i].reshape(1,-1)))
# if iteration is greater than 15
else:
# print first 10 iterations
for i in range(10):
print('=====Iteration{}====='.format(i+1))
# print search direction
print('Search direction: {}.T'.format(d_k_list[i].reshape(1,-1)))
# print step length
print('Step length: {}'.format(s_list[i]))
# print new iterate
print('New iterate: {}.T'.format(x_k_list[i].reshape(1,-1)))
print('''
..........
..........''')
# print last 5 iterations
for i in range(5):
print('=====Iteration{}====='.format(iter-3+i))
# print search direction
print('Search direction: {}.T'.format(d_k_list[iter-5+i].reshape(1,-1)))
# print step length
print('Step length: {}'.format(s_list[iter-5+i]))
# print new iterate
print('New iterate: {}.T'.format(x_k_list[iter-5+i].reshape(1,-1)))
# print warning if maximum iteration is reached
if iter == 999:
print('====================')
print('Maximum iteration is reached !!!')
測試演算法
使用一個非線性函式來測試用Python實現的Backtracking演算法:,其中c值分別使用1,10,100來觀察演算法收斂速度。為了方便比較起始點都設為[1,-1],初始步長為1,步長下降速率為0.5,設為0.3.
當c=1時:
Solution found:
Iterations used: 10
當c=10時:
Solution found:
Iterations used: 19
當c=100時:
Solution found:
Iterations used: 210
對於Gradient-based method來說,若目標函式為二次:,則收斂速度取決於的最大特徵值和最小特徵值的比值,若比值越大則收斂速度越慢;若目標函式為高次函式,則取決於目標函式在最優解(stationary point)的Hessian matrix的最大特徵值和最小特徵值的比值。
該測試函式為高次函式,當c=1,c=10及c=100時其Hessian matrix在最優解點的Hessian matrix分別為:(最大特徵值/最小特徵值=2.4),(最大特徵值/最小特徵值=6.1,(最大特徵值/最小特徵值=37.4)。由此結果可驗證,以上使用python實現的演算法符合Gradient-based method的收斂理論。