邏輯迴歸及其python實現

阿新 • • 發佈：2019-01-08

邏輯迴歸原理

sigmod函式

這裡寫圖片描述

下圖給出了sigmod 函式在不同座標尺度下的兩條曲線圖。當 x 為 0 日牝 Sigmoid 函式值為 0.5 。
隨著 1 的增大，對應的sigmod值將逼近於 1; 而隨著 x 的減小， Sigmoid 值將逼近於 0 。如果橫座標
刻度足夠大（下圖），sigmod 函式看起來很像一個階躍函式。

這裡寫圖片描述

原理

這裡寫圖片描述
上圖，將y作為正例的可能性，則1-y是反例的可能性

• 無需事先假設資料分佈

• 可得到“類別”的近似概率預測

• 可直接應用現有數值優化演算法求取最優解

如何確定最優係數

若將 y 看作類後驗概率估計這裡寫圖片描述
則：

寫為
：

顯然這裡寫圖片描述

於是，可使用“極大似然法”
給定資料集這裡寫圖片描述
最大化“對數似然”(log-likelihood)函式

這裡寫圖片描述

梯度上升法

原理

梯度上升法基於的思想是：要找到某函式的最大值，最好的方法是沿著該函式的梯度方向探尋。如果梯度記為 ^ , 則函式 f(x,y)的梯度由下式表示：
這裡寫圖片描述

梯度上升演算法沿梯度方向移動了一步。可以看到，梯度運算元總是指向函式值增長最快的方向。這裡所說的是移動方向，而未提到移動量的大小。該量值稱為步長，記做\alpha。用向量來表示的話，梯度演算法的迭代公式如下
這裡寫圖片描述
該公式將一直被迭代執行，直至達到某個停止條件為止，比如迭代次數達到某個指定值或算法達到某個可以允許的誤差範圍

這裡寫程式碼片

演算法

虛擬碼
這裡寫圖片描述

我們現在採用的是 100 個樣本的簡單資料集，它包含了兩個特徵X1 和 X2

from numpy import *

def loadDataSet():
    dataMat = []; labelMat = []
    fr = open('C:/Users/elenawang/Documents/machinelearninginaction/Ch05/testSet.txt')
    for line in fr.readlines():
        lineArr = line.strip().split()
        dataMat.append([1.0 
, float(lineArr[0]), float(lineArr[1])])
        labelMat.append(int(lineArr[2]))
    return dataMat,labelMat

def sigmoid(inX):
    return 1.0/(1+exp(-inX))

def gradAscent(dataMatIn, classLabels):
    dataMatrix = mat(dataMatIn)             #convert to NumPy matrix
    labelMat = mat(classLabels).transpose() #convert to NumPy matrix
    m,n = shape(dataMatrix)
    alpha = 0.001
    maxCycles = 500
    weights = ones((n,1))
    for k in range(maxCycles):              #heavy on matrix operations
        h = sigmoid(dataMatrix*weights)     #matrix mult
        error = (labelMat - h)              #vector subtraction
        weights = weights + alpha * dataMatrix.transpose()* error #matrix mult
    return weights

看看效果

dataarr,labelmat=loadDataSet()
gradAscent(dataarr,labelmat)

matrix([[ 4.12414349],
        [ 0.48007329],
        [-0.6168482 ]])

畫出決策邊界

def plotBestFit(weights):
    import matplotlib.pyplot as plt
    dataMat,labelMat=loadDataSet()
    dataArr = array(dataMat)
    n = shape(dataArr)[0] 
    xcord1 = []; ycord1 = []
    xcord2 = []; ycord2 = []
    for i in range(n):
        if int(labelMat[i])== 1:
            xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
    ax.scatter(xcord2, ycord2, s=30, c='green')
    x = arange(-3.0, 3.0, 0.1)
    y = (-weights[0]-weights[1]*x)/weights[2]
    y = array(y)
    ax.plot(x, y[0])  #必須是y[0]
    plt.xlabel('X1'); plt.ylabel('X2');
    plt.show()

weights=gradAscent(dataarr,labelmat)
plotBestFit(weights)

這裡寫圖片描述

隨機梯度上升

原理
梯度上升演算法在每次更新迴歸係數時都需要遍歷整個資料集, 該方法在處理 100 個左右的資料集時尚可，但如果有數十億樣本和成千上萬的特徵，那麼該方法的計算複雜度就太高了。一種改進方法是一次僅用一個樣本點來更新迴歸係數，該方法稱為隨機梯度上升算法。由於可以在新樣本到來時對分類器進行增量式更新，因而隨機梯度上升演算法是一個線上學習演算法。與 “線上學習 ” 相對應，一次處理所有資料被稱作是 “ 批處理” 。

演算法

虛擬碼：
初始化迴歸係數為1

重複下面步驟直到收斂{

    對資料集中每個樣本

           計算該樣本的梯度

            使用alpha xgradient來更新迴歸係數

}

def stocGradAscent0(dataMatrix, classLabels):
    m,n = shape(dataMatrix)
    alpha = 0.01
    weights = ones(n)   #initialize to all ones
    for i in range(m):
        h = sigmoid(sum(dataMatrix[i]*weights))
        error = classLabels[i] - h
        weights = weights + dot(alpha * error ,dataMatrix[i])
    return weights

def plotBestFit(weights):
    import matplotlib.pyplot as plt
    dataMat,labelMat=loadDataSet()
    dataArr = array(dataMat)
    n = shape(dataArr)[0] 
    xcord1 = []; ycord1 = []
    xcord2 = []; ycord2 = []
    for i in range(n):
        if int(labelMat[i])== 1:
            xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
    ax.scatter(xcord2, ycord2, s=30, c='green')
    x = arange(-3.0, 3.0, 0.1)
    y = (-weights[0]-weights[1]*x)/weights[2]
    y = array(y)
    ax.plot(x, y)  #改成y
    plt.xlabel('X1'); plt.ylabel('X2');
    plt.show()

weights=stocGradAscent0(dataarr,labelmat)
plotBestFit(weights)

這裡寫圖片描述

增加迭代次數改進上述演算法

邏輯迴歸及其python實現

邏輯迴歸原理

梯度上升法

隨機梯度上升

邏輯迴歸及其python實現

邏輯迴歸的python實現

NG機器學習總結-（四）邏輯迴歸以及python實現

機器學習演算法之邏輯迴歸以及python實現

二，機器學習演算法之邏輯迴歸（python實現）

機器學習筆記 -吳恩達（第七章：邏輯迴歸，python實現附原始碼）

利用python實現梯度下降和邏輯迴歸原理(Python詳細原始碼：預測學生是否被錄取)

【深度學習】線性迴歸（二）小批量隨機梯度下降及其python實現

機器學習：邏輯迴歸與Python程式碼實現

梯度下降和邏輯迴歸例子(Python程式碼實現)

Fuzzy C Means 算法及其 Python 實現——寫得很清楚，見原文

（轉）梯度下降法及其Python實現

Kmeans聚類算法及其 Python實現

機器學習算法整理（二）邏輯回歸 python實現

常用algorithm及其Python實現

Kmeans 聚類及其python實現

Logistic迴歸（Python實現）

梯度下降法實現最簡單線性迴歸問題python實現

（轉）二十三種設計模式及其python實現

Softmax&SVM loss&gradient公式圖及其python實現

邏輯迴歸及其python實現

邏輯迴歸原理

梯度上升法

隨機梯度上升

相關推薦