1. 程式人生 > 其它 >Logistic迴歸——原理加實戰

Logistic迴歸——原理加實戰

Logistic迴歸

1. 什麼是Logistic迴歸

Logistic是一種常用的分類方法,屬於對數線性模型,利用Logistic迴歸,根據現有資料對分類邊界建立迴歸公式,以此進行分類。

迴歸:假設現有一些資料點,我們用一條直線對這些點進行擬合,這個擬合過程就稱為迴歸

2. Logistic迴歸與Sigmoid函式

Sigmoid函式:

\[\sigma(z) = \frac{1}{1 + e^{-z}} \tag{1} \]

下圖為Sigmoid函式曲線圖。z為0時,Sigmoid函式值為0.5。隨著z的增大,Sigmoid函式值將趨近於1;隨著x的減小,Sigmoid函式值將趨近於0.

為了實現Logistic迴歸分類器,我們在每個特徵上都乘以一個迴歸係數,然後把結果相加,總和帶入Sigmoid函式中,得到一個0~1的數值。若數值大於0.5,則被分為1類,否則,分為0類。

Logistic迴歸:

考慮n維特徵\(x = (x_0,x_1,x_2,\cdots,x_n)\),引數向量\(w=(w_0,w_1,w_2,\cdots,w_n)\)我們對輸入資料線性加權得:

\[z = w^Tx = w_0x_0+ w_1x_1 + w_2x_2 + \cdots \cdots + w_nx_n \tag{2} \]

將z作為自變數帶入Sigmoid函式中,得到一個0~1的數值。若數值大於0.5,則被分為1類,否則,分為0類。即

\[\sigma(z) = \frac{1}{1 + e^{-z}} = \frac{1}{1 + e^{-w^Tx}} = \sigma(w^Tx) \]

現在的問題則是:如何確定最佳引數\(w\)

從而使分類儘可能地準確。

3. Logistic迴歸引數優化法

3.1.1 梯度上升法

梯度上升法即沿著該函式的梯度方向探尋,尋找最優解。記函式f(x,y)的梯度為

\[\nabla f(x,y) = \begin{bmatrix} \frac{\partial f(x,y)}{\partial x}\\ \frac{\partial f(x,y)}{\partial y} \end{bmatrix} \]

梯度演算法迭代公式如下:

\[w = w + \alpha \nabla_w f(w) \]

改公式一直迭代執行,直至達到某個條件未知,如達到可以允許的誤差範圍,或迭代次數達到某個值。

PS: 梯度上升法用來求最大值,而梯度下降法是用來求最小值

3.1.2 目標函式與梯度上升

目標函式

使用梯度上升演算法之前,我們需要知道如何優化,才能達到我們的目的,即目標函式是什麼,根據目標函式來使用梯度上升演算法。我們考慮二分類問題,其中包含類別1與類別0,可以得到預測函式,公式如下:

\[f_w(x)= \sigma(w^Tx) = \frac{1}{1 + e^{-w^Tx}} \]

\(f_w(x)\)的值表示\(y=1\)的概率,因此分類結果為類別1與類別0的概率分別為:

\[P(y=1|x;w) = f_w(x)\\ \\ P(y=0|x;w) = 1 - f_w(x) \]

即:

\[P(y|x;w) = (f_w(x))^y (1 - f_w(x))^{1-y} \]

其似然函式為:

\[L(w) = \prod_{i=1}^{m}P(y^{(i)}|x^{(i)};w) = \prod_{i=1}^{m}(f_w(x^{(i)}))^{y^{(i)}} (1 - f_w(x^{(i)}))^{1-y^{(i)}} \]

\(m\)為樣本個數

對數似然函式為:

\[l(w) = lnL(w) = \sum_{i=1}^{m}\begin{Bmatrix}y^{(i)}ln(f_w(x^{(i)})) + (1-y^{(i)})ln(1 - f_w(x^{(i)}))\end{Bmatrix} \]

最大似然估計就是要求使得\(l(w)\)達到最大值的\(w\),所以目標函式就是\(l(w)\)

梯度

\[\begin{align} \frac{\partial l(w)}{\partial w_j} &= \sum_{i = 1}^{m}\begin{Bmatrix} y^{(i)}\frac{1}{f_x(x^{(i)})}\frac{\partial f_x(x^{(i)})}{\partial w_j} - (1-y^{(i)})\frac{1}{1 - f_x(x^{(i)})}\frac{\partial f_x(x^{(i)})}{\partial w_j} \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} y^{(i)}\frac{1}{\sigma(w^Tx^{(i)})}\frac{\partial f_x(x^{(i)})}{\partial w_j} - (1-y^{(i)})\frac{1}{1 - f_x(x^{(i)})}\frac{\partial f_x(x^{(i)})}{\partial w_j} \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} \frac{\partial \sigma(w^Tx^{(i)})}{\partial w_j}(y^{(i)}\frac{1}{\sigma(w^Tx^{(i)})} - (1-y^{(i)})\frac{1}{1 - \sigma(w^Tx^{(i)}))}) \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} \frac{\partial \sigma(w^Tx^{(i)})}{\partial w_j}(y^{(i)}\frac{1}{\sigma(w^Tx^{(i)})} - (1-y^{(i)})\frac{1}{1 - \sigma(w^Tx^{(i)})}) \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} \sigma(w^Tx^{(i)}) (1 - \sigma(w^Tx^{(i)})) \frac{\partial w^Tx^{(i)}}{\partial w_j} (y^{(i)}\frac{1}{\sigma(w^Tx^{(i)})} - (1-y^{(i)})\frac{1}{1 - \sigma(w^Tx^{(i)})}) \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} y^{(i)}(1 - \sigma(w^Tx^{(i)})) - (1-y^{(i)})\sigma(w^Tx^{(i)})x^{(i)}_j \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} (y^{(i)}- \sigma(w^Tx^{(i)}))x^{(i)}_j \end{Bmatrix} \\ &=\sum_{i = 1}^{m}\begin{Bmatrix} (y^{(i)}- f_x(x^{(i)}))x^{(i)}_j \end{Bmatrix} \end{align} \]

3.1.3 梯度上升法程式碼實現

import numpy as np

def loadDataSet():
    dataMat = []
    labelMat = []
    file = open('testSet.txt','r')				# testSet.txt可在附錄獲取
    for line in file:
        strLine = line.strip().split()
        dataMat.append([1.0,float(strLine[0]),float(strLine[1])])
        labelMat.append([strLine[2]])
    return dataMat,labelMat

def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))

def lossFunction(y,y_hat):		# 梯度的相減部分
    return y - y_hat

def gradAscent(data,labels):
    dataMat = np.mat(data, dtype = 'float64')                      # 轉換為numpy資料型別
    labelMat = np.mat(labels, dtype = 'float64')
    m,n = dataMat.shape
    lr = 0.001
    epochs = 500
    weights = np.ones((n,1))
    for epoch in range(epochs):
        labelEst = sigmoid(dataMat*weights)
        loss = lossFunction(labelMat,labelEst)					# 目標函式
        weights = weights + lr * dataMat.transpose() * loss
    return weights

def plotBestFit(weight):
    weightArray = weight.getA()
    dataMat, labelMat = loadDataSet()
    dataArr = np.array(dataMat)
    n = dataArr.shape[0]
    xcord1 = []; ycord1 = []
    xcord2 = []; ycord2 = []
    for i in range(n):
        if int(labelMat[i][0]) == 1:
            xcord1.append(dataArr[i,1])
            ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1])
            ycord2.append(dataArr[i,2])
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1,ycord1,s=30,c='red',marker='s')
    ax.scatter(xcord2,ycord2,s=30,c='green')
    x = np.arange(-3.0,3.0,0.1)
    y = (-weightArray[0] - weightArray[1] * x) / weightArray[2]
    ax.plot(x,y)
    plt.xlabel('x1'); plt.ylabel('x2')
    plt.show()

data,labels = loadDataSet()
weights = gradAscent(data,labels)
print(weights)
plotBestFit(weights)

視覺化:

3.1.4 改進的隨機梯度上升演算法

隨機梯度下降法不同於批量梯度下降,隨機梯度下降是每次迭代使用一個樣本來對引數進行更新。使得訓練速度加快。推薦一篇講的非常好的博文:批量梯度下降、隨機梯度下降和小批量梯度下降

def stocGradAscent1(dataMatrix, classLabels,numIter=150):
    dataMatrix = np.array(data, dtype='float64')  # 轉換為numpy資料型別
    classLabels = np.array(labels, dtype='float64')
    m,n = np.shape(dataMatrix)
    weights = np.ones(n)
    for j in range(numIter):
        dataIndex = list(range(m))
        for i in range(m):
            lr = 4 / (1.0 +j + i) + 0.01
            randIndex = int(random.uniform(0,len(dataIndex)))
            h = sigmoid(sum(dataMatrix[randIndex] * weights))
            error = classLabels[randIndex] - h
            weights = weights + lr * error * dataMatrix[randIndex]
            del (dataIndex[randIndex])
    return weights

4. 附錄

testSet.txt檔案:

-0.017612	14.053064	0
-1.395634	4.662541	1
-0.752157	6.538620	0
-1.322371	7.152853	0
0.423363	11.054677	0
0.406704	7.067335	1
0.667394	12.741452	0
-2.460150	6.866805	1
0.569411	9.548755	0
-0.026632	10.427743	0
0.850433	6.920334	1
1.347183	13.175500	0
1.176813	3.167020	1
-1.781871	9.097953	0
-0.566606	5.749003	1
0.931635	1.589505	1
-0.024205	6.151823	1
-0.036453	2.690988	1
-0.196949	0.444165	1
1.014459	5.754399	1
1.985298	3.230619	1
-1.693453	-0.557540	1
-0.576525	11.778922	0
-0.346811	-1.678730	1
-2.124484	2.672471	1
1.217916	9.597015	0
-0.733928	9.098687	0
-3.642001	-1.618087	1
0.315985	3.523953	1
1.416614	9.619232	0
-0.386323	3.989286	1
0.556921	8.294984	1
1.224863	11.587360	0
-1.347803	-2.406051	1
1.196604	4.951851	1
0.275221	9.543647	0
0.470575	9.332488	0
-1.889567	9.542662	0
-1.527893	12.150579	0
-1.185247	11.309318	0
-0.445678	3.297303	1
1.042222	6.105155	1
-0.618787	10.320986	0
1.152083	0.548467	1
0.828534	2.676045	1
-1.237728	10.549033	0
-0.683565	-2.166125	1
0.229456	5.921938	1
-0.959885	11.555336	0
0.492911	10.993324	0
0.184992	8.721488	0
-0.355715	10.325976	0
-0.397822	8.058397	0
0.824839	13.730343	0
1.507278	5.027866	1
0.099671	6.835839	1
-0.344008	10.717485	0
1.785928	7.718645	1
-0.918801	11.560217	0
-0.364009	4.747300	1
-0.841722	4.119083	1
0.490426	1.960539	1
-0.007194	9.075792	0
0.356107	12.447863	0
0.342578	12.281162	0
-0.810823	-1.466018	1
2.530777	6.476801	1
1.296683	11.607559	0
0.475487	12.040035	0
-0.783277	11.009725	0
0.074798	11.023650	0
-1.337472	0.468339	1
-0.102781	13.763651	0
-0.147324	2.874846	1
0.518389	9.887035	0
1.015399	7.571882	0
-1.658086	-0.027255	1
1.319944	2.171228	1
2.056216	5.019981	1
-0.851633	4.375691	1
-1.510047	6.061992	0
-1.076637	-3.181888	1
1.821096	10.283990	0
3.010150	8.401766	1
-1.099458	1.688274	1
-0.834872	-1.733869	1
-0.846637	3.849075	1
1.400102	12.628781	0
1.752842	5.468166	1
0.078557	0.059736	1
0.089392	-0.715300	1
1.825662	12.693808	0
0.197445	9.744638	0
0.126117	0.922311	1
-0.679797	1.220530	1
0.677983	2.556666	1
0.761349	10.693862	0
-2.168791	0.143632	1
1.388610	9.341997	0
0.317029	14.739025	0