大叔學ML第二：線性迴歸

阿新 • • 發佈：2018-12-15

線性迴歸非常直觀簡潔，是一種常用的迴歸模型，大叔總結如下：

基本形式

設有樣本\(X\)形如：
\[\begin{pmatrix} x_1^{(1)} & x_2^{(1)} & \cdots &x_n^{(1)}\\ x_1^{(2)} & x_2^{(2)} & \cdots & x_n^{(2)}\\ \vdots & \vdots & \vdots & \vdots\\ x_1^{(m)} & x_2^{(m)} & \cdots & x_n^{(m)}\\ \end{pmatrix} \]

對應的標記\(\vec{y}\)形如：
\[\begin{pmatrix} y^{(1)} \\ y^{(2)} \\ \vdots \\ y^{(m)} \\ \end{pmatrix}\]

其中，矩陣\(X\)的每一行表示一個樣本，一共有m個樣本；每列表示樣本的一個屬性，共有n個屬性。設假設函式
\[h(x_1,x_2 \dots x_n)= \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \dots + \theta_n x_n \tag{1}\]

設\(x_0=1\)，則(1)式重新寫為
\[h(x_1,x_2 \dots x_n)= \theta_0x_0 + \theta_1 x_1 + \theta_2 x_2 + \dots + \theta_n x_n \tag{2}\]

定義代價函式(均方誤差)
\[j(\theta_0,\theta_1\dots \theta_n)=\frac{1}{2m}\sum_{k=1}^m (h(x_1^{(k)},x_2^{(k)} \dots x_n^{(k)}) - y^{(k)})^2 \]
即：
\[j(\theta_0,\theta_1\dots \theta_n)=\frac{1}{2m}\sum_{k=1}^m (\theta_0x_0^{(k)} + \theta_1 x_1^{(k)} + \theta_2 x_2^{(k)} + \dots + \theta_n x_n^{(k)} - y^{(k)})^2 \tag{3}\]

這裡的分母乘以2並沒有意義，只是為了求導後正好約掉。我們的目標是根據樣本資料確定引數\(\vec\theta\)

求解引數\(\vec\theta\)

梯度下降法

關於梯度下降法可參考大叔學ML第一：梯度下降

由於代價函式是一個凸函式，可以用梯度下降法找到最小值。由於用到梯度，首先對\(\theta_0\)、\(\theta_1\)、\(\theta_2\)直到\(\theta_n\)求偏導：

\(\frac{\partial}{\partial\theta_0}j(\theta_0,\theta_1\dots \theta_n) = \frac{1}{m}\sum_{k=1}^m(\theta_0x_0^{(k)} + \theta_1x_1^{(k)} + \dots+ \theta_nx_n^{(k)} - y^{(k)})x_0^{(k)}\)
\(\frac{\partial}{\partial\theta_1}j(\theta_0,\theta_1\dots \theta_n) = \frac{1}{m}\sum_{k=1}^m(\theta_0x_0^{(k)} + \theta_1x_1^{(k)} + \dots+ \theta_nx_n^{(k)}- y^{(k)})x_1^{(k)}\)
\(\dots\)
\(\frac{\partial}{\partial\theta_n}j(\theta_0,\theta_1\dots \theta_n) = \frac{1}{m}\sum_{k=1}^m(\theta_0x_0^{(k)} + \theta_1x_1^{(k)} + \dots+ \theta_nx_n^{(k)}- y^{(k)})x_n^{(k)} \tag{4}\)

萬事俱備，現在可以程式設計了：

import numpy as np

def gradient(X, Y, m, theta):
    ''' 求theta位置的梯度.

    Args:
        X: 樣本
        Y: 樣本標記
        m: 樣本數
        theta: 欲求梯度的位置
    
    Returns:
        gi: theta處函式的梯度值
    '''
    theta_size = np.size(theta)
    g = np.zeros(theta_size)

    for i in range(theta_size):   
        gi = 0 #第i個theta分量對應的偏導     
        for j in range(m):
            gi += ((np.dot(X[j], theta) - Y[j]) * X[j, i])
        gi = gi / m 
        g[i] = gi

    return g

def gradient_descent(X, Y, step = 0.02, threshold = 0.01):
    ''' 梯度下降法求使代價函式最小的 theta

    Args:
        X: 樣本
        Y: 樣本標記
        step:步長
        threshold:梯度模長閾值，低於此值時停止迭代
    Returns:
        theta: 使代價函式取最小值的theta
    '''
    theta = np.random.rand(4)    
    grad = gradient(X, Y, np.size(X, 0), theta)
    norm = np.linalg.norm(grad)
    
    while(norm > threshold):
        theta -= step * grad
        grad = gradient(X, Y, np.size(X, 0), theta)
        norm = np.linalg.norm(grad)
    return theta

''' 以下是測試資料 '''

# 測試用線性函式
def linear_function(x1, x2, x3):
    result = 1 + 2 * x1 + 3 * x2 + 4 * x3 
    result = result + np.random.rand() # 噪音
    return result

# 計算函式值
def calculate(X):    
    rowsnumber = np.size(X, axis = 0)    
    Y = [linear_function (X[i, 0], X[i, 1], X[i, 2]) for i in range(0, rowsnumber)]
    return Y

if __name__ == "__main__":
    row_count = 500
    X = np.random.randint(0, 10, (row_count, 3)) # 隨機產生row_count個樣本
    Y = calculate(X) # 計算標記

    X0 = np.ones((row_count, 1)) 
    X = np.hstack((X0, X)) # 補充一列1

    theta = gradient_descent(X, Y)
    print('theta is ', theta)

執行結果：theta is [1.41206515 2.00558441 3.0013728 4.00684577]

上面的迭代方法被稱為批量梯度下降法，參考式(4)，計算梯度時用到了所有的樣本。梯度下降法還有個簡化的版本，叫做隨機梯度下降法，每次計算梯度時只隨機使用一個樣本，而不是所有樣本，這樣可以加快計算速度。將式(4)修改為：
\[\frac{\partial}{\partial\theta_n}j(\theta_0,\theta_1\dots \theta_n) = \frac{1}{m}(\theta_0x_0^{(k)} + \theta_1x_1^{(k)} + \dots+ \theta_nx_n^{(k)}- y^{(k)})x_n^{(k)} \tag{5}\]
其中：\(1 \leq k \leq m\)

將上面Python程式碼中的方法gradient替換一下：

def gradient_sgd(X, Y, m, theta):
    ''' 求theta位置的梯度.

    Args:
        X: 樣本
        Y: 樣本標記
        m: 樣本數
        theta: 欲求梯度的位置
    
    Returns:
        gi: theta處函式的梯度值
    '''
    theta_size = np.size(theta)
    g = np.zeros(theta_size)

    for i in range(theta_size):   
        random_Index = np.random.randint(1, m + 1)
        gi = ((np.dot(X[random_Index], theta) - Y[random_Index]) * X[random_Index, i])
        gi = gi / m 
        g[i] = gi

    return g

執行結果：
theta is [1.43718942 2.00043557 3.00620849 4.00674728]

感覺像是飛起來。隨機梯度下降法肯定沒有批量梯度下降法準確，所有還有第三種下降法，叫做小批量梯度下降法，介於批量梯度下降法和隨機梯度下降法之間，每次計算梯度使用隨機的一小批樣本，此處不再code說明。

直接求導法

因為代價函式是個凸函式，那麼我們可以對代價函式求導，讓其導數等於0的點即為最小值點。

為方便計算，我們在前面增加了一個值恆等於1的\(x_0\)，這樣就把線性函式的偏置項去掉了，參考式(2)，重新定義矩陣\(X\)為：
\[\begin{pmatrix} x_0^{(1)} & x_1^{(1)} & x_2^{(1)} & \cdots &x_n^{(1)}\\ x_0^{(2)} &x_1^{(2)} & x_2^{(2)} & \cdots & x_n^{(2)}\\ \vdots & \vdots & \vdots & \vdots & \vdots\\ x_0^{(m)} & x_1^{(m)} & x_2^{(m)} & \cdots & x_n^{(m)}\\ \end{pmatrix}\]
式(2)可簡化為：
\[h(\vec{x})= \vec\theta \cdot \vec{x} \tag{6}\]
代價函式式(3)可寫為：
\[j(\vec\theta)=\frac{1}{2m}\sum_{k=1}^m (h(\vec{x}^{(k)}) - y^{(k)})^2 \tag{7}\]
等價於：
\[J(\vec\theta)=\frac{1}{2m}||X\vec\theta - \vec{y}||^2 \tag{8}\]
化簡式(8):
\[\begin{align} J(\vec\theta)&=\frac{1}{2m}||X\vec\theta - \vec{y}||^2 \\ &=\frac{1}{2m}(X\vec\theta - \vec{y})^T(X\vec\theta - \vec{y}) \\ &=\frac{1}{2m}(\vec\theta^TX^T - \vec{y}^T)(X\vec\theta - \vec{y}) \\ &=\frac{1}{2m}(\vec\theta^TX^TX\vec\theta - \vec\theta^TX^T\vec{y}- \vec{y}^TX\vec\theta + \vec{y}^T\vec{y})\\ &=\frac{1}{2m}(\vec\theta^TX^TX\vec\theta - 2\vec{y}^TX\vec\theta + \vec{y}^T\vec{y})\\ \end{align}\]
對\(\vec\theta\)求導：
\[\frac{d}{d\vec\theta}J(\vec\theta)=\frac{1}{m}(X^TX\vec\theta-X^T\vec{y})\]
令其等於0，得：\[\vec\theta=(X^TX)^{-1}X^T\vec{y}\tag{9}\]

將上面的Python程式碼改為：

# 測試用線性函式
def linear_function(x1, x2, x3):
    result = 1 + 2 * x1 + 3 * x2 + 4 * x3 
    result = result + np.random.rand() # 噪音
    return result

# 計算函式值
def calculate(X):    
    rowsnumber = np.size(X, axis = 0)    
    Y = [linear_function (X[i, 0], X[i, 1], X[i, 2]) for i in range(0, rowsnumber)]
    return Y

if __name__ == "__main__":
    row_count = 500
    X = np.random.randint(0, 10, (row_count, 3)) # 隨機產生row_count個樣本
    Y = calculate(X) # 計算標記

    X0 = np.ones((row_count, 1)) 
    X = np.hstack((X0, X)) # 補充一列1

    theta = np.dot(np.dot(np.linalg.pinv(np.dot(X.T, X)), X.T), np.array(Y).T)
    print('theta is ', theta)

執行結果：theta is [1.49522638 1.99801209 2.99704438 4.00427252]

和梯度下降法比較，光速的感覺，那為什麼還要用梯度下降法呢？這是因為求矩陣的逆演算法複雜度較高，達爺的建議是：如果樣本的屬性超過一萬個，考慮使用梯度下降法。

呼叫函式庫

其實我們也可以直接呼叫類庫的，有很多類庫可以做迴歸演算法，比如：

import numpy as np
from sklearn import linear_model

# 測試用線性函式
def linear_function(x1, x2, x3):
    result = 1 + 2 * x1 + 3 * x2 + 4 * x3 
    result = result + np.random.rand() # 噪音
    return result

# 計算函式值
def calculate(X):    
    rowsnumber = np.size(X, axis = 0)    
    Y = [linear_function (X[i, 0], X[i, 1], X[i, 2]) for i in range(0, rowsnumber)]
    return Y

if __name__ == "__main__":
    row_count = 500
    X = np.random.randint(0, 10, (row_count, 3)) # 隨機產生row_count個樣本
    Y = calculate(X) # 計算標記

    regr = linear_model.LinearRegression()
    regr.fit(X, np.array(Y).T)

    a, b = regr.coef_, regr.intercept_
    print(a)
    print(b)

執行結果：
[2.00384674 2.99234723 3.99603084]
1.5344826581936104

和我們自己算的差不多吧。還有很多其他的類庫可以呼叫，大叔沒有一一去找。可能通常只要呼叫類庫就足夠了，不需要我們自己寫，不過還是知道原理比較好，遇到問題才好對症下藥。

大叔學ML第二：線性迴歸

基本形式

求解引數\(\vec\theta\)

梯度下降法

直接求導法

呼叫函式庫

大叔學ML第二：線性迴歸

ml課程：線性迴歸、邏輯迴歸入門（含程式碼實現）

大叔學ML第四：線性迴歸正則化

大叔學ML第三：多項式迴歸

大叔學ML第五：邏輯迴歸

ML模型2：線性迴歸模型

斯坦福CS229機器學習課程筆記一：線性迴歸與梯度下降演算法

pytorch入門——邊學邊練02線性迴歸

機器學習筆記第5課：線性迴歸演算法

機器學習：線性迴歸、嶺迴歸、Lasso迴歸

用python來實現機器學習（一）：線性迴歸（linear regression）

Pytorch（三）入門：線性迴歸

機器學習筆記（三）：線性迴歸大解剖（原理部分）

機器學習筆記（三）：線性迴歸大解剖（程式碼部分）

機器學習筆記《四》：線性迴歸，邏輯迴歸案例與重點細節問題分析

TensorFlow入門教程：11：線性迴歸

機器學習實戰教程（十二）：線性迴歸提高篇之樂高玩具套件二手價預測

機器學習：線性迴歸和嶺迴歸入門程式碼

MATLAB第十一課：線性迴歸方程式與線性系統

Python3《機器學習實戰》學習筆記（十二）：線性迴歸提高篇之樂高玩具套件二手價預測

大叔學ML第二：線性迴歸

基本形式

求解引數\(\vec\theta\)

梯度下降法

直接求導法

呼叫函式庫

相關推薦