基於Python fminunc 的替代方法
最近閒著沒事,想把coursera上斯坦福ML課程裡面的練習,用Python來實現一下,一是加深ML的基礎,二是熟悉一下numpy,matplotlib,scipy這些庫。
在EX2中,優化theta使用了matlab裡面的fminunc函式,不知道Python裡面如何實現。搜尋之後,發現stackflow上有人提到用scipy庫裡面的minimize函式來替代。我嘗試直接呼叫我的costfunction和grad,程式報錯,提示(3,)和(100,1)dim維度不等,gradient vector不對之類的,試了N多次後,終於發現問題何在。。
首先來看看使用np.info(minimize)檢視函式的介紹,傳入的引數有:
fun : callable The objective function to be minimized. ``fun(x,*args) -> float`` where x is an 1-D array with shape (n,) and `args` is a tuple of the fixed parameters needed to completely specify the function. x0 : ndarray,shape (n,) Initial guess. Array of real elements of size (n,),where 'n' is the number of independent variables. args : tuple,optional Extra arguments passed to the objective function and its derivatives (`fun`,`jac` and `hess` functions). method : str or callable,optional Type of solver. Should be one of - 'Nelder-Mead' :ref:`(see here) <optimize.minimize-neldermead>` - 'Powell' :ref:`(see here) <optimize.minimize-powell>` - 'CG' :ref:`(see here) <optimize.minimize-cg>` - 'BFGS' :ref:`(see here) <optimize.minimize-bfgs>` - 'Newton-CG' :ref:`(see here) <optimize.minimize-newtoncg>` - 'L-BFGS-B' :ref:`(see here) <optimize.minimize-lbfgsb>` - 'TNC' :ref:`(see here) <optimize.minimize-tnc>` - 'COBYLA' :ref:`(see here) <optimize.minimize-cobyla>` - 'SLSQP' :ref:`(see here) <optimize.minimize-slsqp>` - 'trust-constr':ref:`(see here) <optimize.minimize-trustconstr>` - 'dogleg' :ref:`(see here) <optimize.minimize-dogleg>` - 'trust-ncg' :ref:`(see here) <optimize.minimize-trustncg>` - 'trust-exact' :ref:`(see here) <optimize.minimize-trustexact>` - 'trust-krylov' :ref:`(see here) <optimize.minimize-trustkrylov>` - custom - a callable object (added in version 0.14.0),see below for description. If not given,chosen to be one of ``BFGS``,``L-BFGS-B``,``SLSQP``,depending if the problem has constraints or bounds. jac : {callable,'2-point','3-point','cs',bool},optional Method for computing the gradient vector. Only for CG,BFGS,Newton-CG,L-BFGS-B,TNC,SLSQP,dogleg,trust-ncg,trust-krylov,trust-exact and trust-constr. If it is a callable,it should be a function that returns the gradient vector: ``jac(x,*args) -> array_like,)`` where x is an array with shape (n,) and `args` is a tuple with the fixed parameters. Alternatively,the keywords {'2-point','cs'} select a finite difference scheme for numerical estimation of the gradient. Options '3-point' and 'cs' are available only to 'trust-constr'. If `jac` is a Boolean and is True,`fun` is assumed to return the gradient along with the objective function. If False,the gradient will be estimated using '2-point' finite difference estimation.
需要注意的是fun關鍵詞引數裡面的函式,需要把優化的theta放在第一個位置,X,y,放到後面。並且,theta在傳入的時候一定要是一個一維shape(n,)的陣列,不然會出錯。
然後jac是梯度,這裡的有兩個地方要注意,第一個是傳入的theta依然要是一個一維shape(n,),第二個是返回的梯度也要是一個一維shape(n,)的陣列。
總之,關鍵在於傳入的theta一定要是一個1D shape(n,)的,不然就不行。我之前為了方便已經把theta塑造成了一個(n,1)的列向量,導致使用minimize時會報錯。所以,學會用help看說明可謂是相當重要啊~
import numpy as np import pandas as pd import scipy.optimize as op def LoadData(filename): data=pd.read_csv(filename,header=None) data=np.array(data) return data def ReshapeData(data): m=np.size(data,0) X=data[:,0:2] Y=data[:,2] Y=Y.reshape((m,1)) return X,Y def InitData(X): m,n=X.shape initial_theta = np.zeros(n + 1) VecOnes = np.ones((m,1)) X = np.column_stack((VecOnes,X)) return X,initial_theta def sigmoid(x): z=1/(1+np.exp(-x)) return z def costFunction(theta,X,Y): m=X.shape[0] J = (-np.dot(Y.T,np.log(sigmoid(X.dot(theta)))) - \ np.dot((1 - Y).T,np.log(1 - sigmoid(X.dot(theta))))) / m return J def gradient(theta,Y): m,n=X.shape theta=theta.reshape((n,1)) grad=np.dot(X.T,sigmoid(X.dot(theta))-Y)/m return grad.flatten() if __name__=='__main__': data = LoadData('ex2data1csv.csv') X,Y = ReshapeData(data) X,initial_theta = InitData(X) result = op.minimize(fun=costFunction,x0=initial_theta,args=(X,Y),method='TNC',jac=gradient) print(result)
最後結果如下,符合MATLAB裡面用fminunc優化的結果(fminunc:cost:0.203,theta:-25.161,0.206,0.201)
fun: array([0.2034977]) jac: array([8.95038682e-09,8.16149951e-08,4.74505693e-07]) message: 'Local minimum reached (|pg| ~= 0)' nfev: 36 nit: 17 status: 0 success: True x: array([-25.16131858,0.20623159,0.20147149])
此外,由於知道cost在0.203左右,所以我用最笨的梯度下降試了一下,由於後面實在是太慢了,所以設定while J>0.21,迴圈了大概13W次。。可見,使用整合好的優化演算法是多麼重要。。。還有,在以前的理解中,如果一個學習速率不合適,J會一直髮散,但是昨天的實驗發現,有的速率開始會發散,後面還是會收斂。
以上這篇基於Python fminunc 的替代方法就是小編分享給大家的全部內容了,希望能給大家一個參考,也希望大家多多支援我們。