機器學習演算法(4) Logistic迴歸

阿新 • • 發佈：2019-02-17

基於Logistic迴歸的思想，利用梯度上升的方法，求取回歸係數。並且完成對馬生病資料的訓練和預測。

例子來自《Machine Learning in Action》 Peter Harrington

梯度上升

載入資料集

資料集合中有兩類共100個數據點
""" 載入資料集 """
def loadDataSet():
    dataMat = []; labelMat = []
    fr = open('testSet.txt')
    for line in fr.readlines():
        lineArr = line.strip().split()
        dataMat.append([1.0 
, float(lineArr[0]), float(lineArr[1])])
        labelMat.append(int(lineArr[2]))
    return dataMat,labelMat

sigmoid函式

利用該函式的函式性質，用於分類

""" sigmoid函式 """
def sigmoid(inX):
    return 1.0/(1+exp(-inX))

梯度上升求權重向量

""" 梯度上升 """
def gradAscent(dataMatIn, classLabels):
    dataMatrix = mat(dataMatIn)             #轉換為 NumPy 矩陣 

    labelMat = mat(classLabels).transpose() #轉換為 NumPy 矩陣，求轉置 （行向量-->列向量）
    m,n = shape(dataMatrix)                 #獲取矩陣的大小
    alpha = 0.001                           #步長
    maxCycles = 500                         #迭代代數
    weights = ones((n,1))                   #權重向量
    for k in range(maxCycles):              
        h = sigmoid(dataMatrix*weights)
        error = (labelMat - h)              # 懲罰度 

        weights = weights + alpha * dataMatrix.transpose()* error
    return weights

測試

def testGradAscent():
    dataArr,labelMat = logRegres.loadDataSet()
    weights=logRegres.gradAscent(dataArr,labelMat)
    print(weights)

結果

[[ 4.12414349]
 [ 0.48007329]
 [-0.6168482 ]]

視覺化結果

""" 繪製擬合後的直線 """
def plotBestFit(weights):
    import matplotlib.pyplot as plt
    dataMat,labelMat=loadDataSet()
    dataArr = array(dataMat)
    n = shape(dataArr)[0]      #  資料點的個數
    xcord1 = []; ycord1 = []
    xcord2 = []; ycord2 = []
    for i in range(n):         #  根據資料點的型別進行分類
        if int(labelMat[i])== 1:
            xcord1.append(dataArr[i,1]); ycord1.append(dataArr[i,2])
        else:
            xcord2.append(dataArr[i,1]); ycord2.append(dataArr[i,2])
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.scatter(xcord1, ycord1, s=30, c='red', marker='s')
    ax.scatter(xcord2, ycord2, s=30, c='green')
    x = arange(-3.0, 3.0, 0.1)
    y = (-weights[0]-weights[1]*x)/weights[2]
    ax.plot(x, y)
    plt.xlabel('X1'); plt.ylabel('X2');
    plt.show()

測試

注意numpy矩陣轉換為python陣列

"""測試繪製擬合的直線"""
def testPlotBestFit():
    dataArr,labelMat = logRegres.loadDataSet()
    weights =logRegres.gradAscent(dataArr,labelMat)
    logRegres.plotBestFit(weights.getA())    # getA() : matrix --> array

結果

隨機梯度

之前的梯度計算，當資料集很大的時候，計算量會很大，所以採用隨機梯度演算法，即每一次迭代只計算一個點。

隨機梯度演算法_0

""" 隨機梯度上升0 """
def stocGradAscent0(dataMatrix, classLabels):
    m,n = shape(dataMatrix)
    alpha = 0.01
    weights = ones(n)   
    for i in range(m):
        h = sigmoid(sum(dataMatrix[i]*weights)) # 每次只選取一個特徵點進行訓練
        error = classLabels[i] - h
        weights = weights + alpha * error * dataMatrix[i]
    return weights

測試

"""測試隨機梯度上升0"""
def teststocGradAscent0():
    dataArr,labelMat = logRegres.loadDataSet()
    weights =logRegres.stocGradAscent0(array(dataArr),labelMat)
    logRegres.plotBestFit(weights)

結果

由於迭代次數比較少，所以劃分效果不是很理想。

隨機梯度演算法_1

分析之前效果不理想的原因：
1. 由於迭代過程中步長固定，所以在最後收斂的過程中，會週期震盪。
2. 每次的訓練點不是隨機取得，會收到資料週期性的影響。

針對這兩點，做出如下修改：
1. alpha 步長大小隨著迭代的次數而減少
2. 隨機選取資料點進行訓練

""" 隨機梯度上升1 """
def stocGradAscent1(dataMatrix, classLabels, numIter=150):
    m,n = shape(dataMatrix)
    weights = ones(n)   
    for j in range(numIter):
        dataIndex = list(range(m))    # rang 物件無法迭代
        for i in range(m):
            alpha = 4/(1.0+j+i)+0.0001    # 步長會隨著迭代進行而減少，但不會為0。防止波動和停止不前
            randIndex = int(random.uniform(0,len(dataIndex)))  # 隨機選取迭代值，防止週期波動
            h = sigmoid(sum(dataMatrix[randIndex]*weights))
            error = classLabels[randIndex] - h
            weights = weights + alpha * error * dataMatrix[randIndex]
            del(dataIndex[randIndex])
    return weights

測試

"""測試隨機梯度上升1"""
def teststocGradAscent1():
    dataArr,labelMat = logRegres.loadDataSet()
    weights =logRegres.stocGradAscent1(array(dataArr),labelMat)
    logRegres.plotBestFit(weights)

結果

可以看到，這次的劃分效果就很好了。

應用

利用Logistic迴歸來預測病馬的死亡率

訓練

""" 利用迴歸係數和特徵量計算類別"""
def classifyVector(inX, weights):
    prob = sigmoid(sum(inX*weights))
    if prob > 0.5: return 1.0
    else: return 0.0

""" 載入資料 訓練 測試"""
def colicTest():
    # 訓練迴歸係數
    frTrain = open('horseColicTraining.txt'); frTest = open('horseColicTest.txt')
    trainingSet = []; trainingLabels = []
    for line in frTrain.readlines():
        currLine = line.strip().split('\t')
        lineArr =[]
        for i in range(21):
            lineArr.append(float(currLine[i]))
        trainingSet.append(lineArr)
        trainingLabels.append(float(currLine[21]))
    trainWeights = stocGradAscent1(array(trainingSet), trainingLabels, 500)
    # 測試分類效果
    errorCount = 0; numTestVec = 0.0
    for line in frTest.readlines():
        numTestVec += 1.0
        currLine = line.strip().split('\t')
        lineArr =[]
        for i in range(21):
            lineArr.append(float(currLine[i]))
        if int(classifyVector(array(lineArr), trainWeights))!= int(currLine[21]):
            errorCount += 1
    errorRate = (float(errorCount)/numTestVec)
    print ("the error rate of this test is: %f" % errorRate)
    return errorRate

測試

"""預測病馬死亡率"""    
def multiTest():
    numTests = 10; errorSum=0.0
    for k in range(numTests):
        errorSum += logRegres.colicTest()
    print ("after %d iterations the average error rate is: %f" % (numTests, errorSum/float(numTests)))

結果

the error rate of this test is: 0.432836
the error rate of this test is: 0.268657
the error rate of this test is: 0.417910
the error rate of this test is: 0.313433
the error rate of this test is: 0.298507
the error rate of this test is: 0.358209
the error rate of this test is: 0.298507
the error rate of this test is: 0.283582
the error rate of this test is: 0.388060
the error rate of this test is: 0.402985
after 10 iterations the average error rate is: 0.346269

機器學習演算法(4) Logistic迴歸

基於Logistic迴歸的思想，利用梯度上升的方法，求取回歸係數。並且完成對馬生病資料的訓練和預測。例子來自《Machine Learning in Action》 Peter Harrington 梯度上升載入資料集資料集合中

機器學習演算法：Logistic迴歸學習筆記

一、Logistic Regression的理解角度一：輸入變數X服從logistic分佈的模型

機器學習演算法之一-logistic迴歸、softmax模型

開始一個新的系列，換一換口味分析機器學習演算法的一般思路： 1、確定預測函式或者判別函式，一般表示為H函式; 2、確定Cost函式，表示的是預測輸出值與訓練資料之間的偏差; 3、確定優化演算法。一、前言 logistic迴歸是常用的二分類模型，屬

《機器學習實戰》Logistic迴歸演算法（1）

-0.017612 14.053064 0 -1.395634 4.662541 1 -0.752157 6.5386200 -1.322371 7.152853 0 0.42336311.054677 0 0.406704 7.067335 1 0

【機器學習演算法】線性迴歸以及手推logistic迴歸

一，基本形式：在樣本集D中有n個樣本，即。其中每個樣本x有d個屬性描述， x = (x1;x2;...;xd)，其中xi表示的是第i個屬性上的取值，線性模型試圖學得一個通過屬性的線性組合來進行預測的函式，即：其中w,b是要訓練的引數， w = (w1;w2;...;w

機器學習sklearn19.0——Logistic迴歸演算法

一、Logistic迴歸的認知與應用場景 Logistic迴歸為概率型非線性迴歸模型，是研究二分類觀察結果與一些影響因素之間關係的一種多變量分析方法。通常的問題是，研究某些因素條件下某個結果是否發生，比如醫學中根據病人的一些症狀來判斷它是否患有某種病。二

機器學習演算法--CART分類迴歸樹

許多問題都是非線性的，用線性模型並不能很好的擬合數據，這種情況下可以使用樹迴歸來擬合數據。介紹CART, 樹剪枝，模型樹。 1.CART 傳統決策樹是一種貪心演算法，在給定時間內做出最佳選擇，不關心是否達到全域性最優。切分過於迅速，特徵一旦使用後面將不再使用。不能處理連續型特徵，

《機器學習實戰》Logistic迴歸python3原始碼

邏輯迴歸： 1 梯度上升優化演算法 2 隨機梯度上升演算法 3 改進的隨機梯度上升法開啟pycharm建立一個logRegression.py檔案，輸入如下程式碼： #coding:utf-8 from numpy import * ""

數學推導+純Python實現機器學習演算法：邏輯迴歸

自本系列第一講推出以來，得到了不少同學的反響和贊成，也有同學留言說最好能把數學推導部分寫的詳細點，筆者只能說盡力，因為打公式實在是太浪費時間了。。本節要和大家一起學習的是邏輯（logistic）迴歸模型，繼續按照手推公式+純 Python 的寫作套路。邏輯迴歸本質上跟邏輯這個詞不是很搭邊，叫這個名字完

機器學習演算法2_邏輯迴歸

文章目錄 1 邏輯迴歸 1.1 概念 1.2 推導方法 1.2.1 模型 - Sigmoid 分佈函式 1.2.2 目標函式 - 對數損失函式 1.2.3 求解方法 1.2.3.1

《機器學習實戰》logistic迴歸：關於’此處略去了一個簡單的數學推導‘的個人理解

正在看《機器學習實戰》這本書的朋友，在看到logistic迴歸的地方，可能會對P78頁的梯度上升演算法程式碼以及P79這裡的這句話弄的一頭霧水：“此處略去了一個簡單的數學推導，我把它留給有興趣的讀者”。這句話就是針對下面這段我貼出來的程式碼中的gradAscen

機器學習實戰：logistic迴歸--學習筆記

一、工作原理 1.每個迴歸係數初始化為 1 2.重複 R 次: 1. 計算整個資料集的梯度 2. 使用步長 x 梯度更新迴歸係數的向量 5.返回迴歸係數二、實現程式碼 1.基於梯度上升尋找邏輯迴歸引數

吳恩達機器學習練習2——Logistic迴歸

Logistic迴歸代價函式 Logistic迴歸是分類演算法，它的輸出值在0和1之間。 h(x)的作用是，對於給定的輸入變數，根據選擇的引數計算輸出變數等於1的可能性（estimated probablity）即h(x)=P(y=1|x;

機器學習演算法總結--線性迴歸和邏輯迴歸

1. 線性迴歸簡述在統計學中，線性迴歸（Linear Regression）是利用稱為線性迴歸方程的最小平方函式對一個或多個自變數和因變數之間關係進行建模的一種迴歸分析。這種函式是一個或多個稱為迴歸係數的模型引數的線性組合（自變數都是一次方）。只有一

機器學習演算法之邏輯迴歸以及python實現

下面分為兩個部分： 1. 邏輯迴歸的相關原理說明 2. 通過python程式碼來實現一個梯度下降求解邏輯迴歸過程邏輯迴歸(Logistic Regression) 首先需要說明，邏輯迴歸屬於分類演算法。分類問題和迴歸問題的區別在於，分類問題的輸出是離散

【機器學習實戰】Logistic迴歸總結與思考

【機器學習實戰】Logistic迴歸全部程式均是依照《機器學習實戰》書寫，然後進行了一些修改（順便鞏固python） Logistic原理簡單解釋作者在書中這樣描述Logistic迴歸根據現有資料對分類邊界線建立迴歸公式，以此進行分類 --《機器學習實戰》P73 這本書對於理論的東

二，機器學習演算法之邏輯迴歸（python實現）

邏輯迴歸（Logistic Regression）是目前流行最廣泛的演算法之一。 1. 何為邏輯迴歸：邏輯迴歸主要思想是根據現有的訓練集(資料)進行分類，判斷這些資料屬於哪一個類別，通

【十】機器學習之路——logistic迴歸python實現

前面一個部落格機器學習之路——logistic迴歸講了logistic迴歸的理論知識，現在咱們來看一下logistic迴歸如何用python來實現，程式碼、資料參考《機器學習實戰》。首先看下我們要處理的資料，我們要做的就是通過logistic

吳恩達機器學習筆記 —— 7 Logistic迴歸

本章主要講解了邏輯迴歸相關的問題，比如什麼是分類？邏輯迴歸如何定義損失函式？邏輯迴歸如何求最優解？如何理解決策邊界？如何解決多分類的問題？有的時候我們遇到的問題並不是線性的問題，而是分類的問題。比如判斷郵件是否是垃圾郵件，信用卡交易是否正常，腫瘤是良性還是惡性的。他們有一個共同點就是Y只有兩個值{0,

牛頓法解機器學習中的Logistic迴歸

引言這仍然是近期系列文章中的一篇。在這一個系列中，我打算把機器學習中的Logistic迴歸從原理到應用詳細串起來。最初我們介紹了在Python中利用Scikit-Learn來建立Logistic迴歸分類器的方法此後，我們對上述文章進行了更深一層的探討

機器學習演算法(4) Logistic迴歸

梯度上升

載入資料集

sigmoid函式

梯度上升求權重向量

測試

結果

視覺化結果

測試

結果

隨機梯度

隨機梯度演算法_0

測試

結果

隨機梯度演算法_1

測試

結果

應用

訓練

測試

結果

相關推薦