邏輯迴歸模型python程式碼加詳細註釋

阿新 • • 發佈：2021-02-02

文章參考於https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/79767043
筆者只是對其中的程式碼做了較為詳細的註釋，便於初學者理解
與線性迴歸不同，Logistic 迴歸沒有封閉解。但由於損失函式是凸函式，因此我們可以使用梯度下降法來訓練模型。事實上，在保證學習速率足夠小且使用足夠的訓練迭代步數的前提下，梯度下降法(或任何其他優化演算法)可以是能夠找到全域性最小值。
第0步：用 0 (或小的隨機值)來初始化權重向量和偏置值
第 1 步：計算輸入的特徵與權重值的線性組合，這可以通過向量化和向量傳播來對所有訓練樣本進行處理：a = X*w + b，其中 X 是所有訓練樣本的維度矩陣

第 2 步：用 sigmoid 函式作為啟用函式，其返回值介於0到1之間：
在這裡插入圖片描述

第 3 步：計算整個訓練集的損失值。
我們希望模型得到的目標值概率落在 0 到 1 之間。因此在訓練期間，我們希望調整引數，使得模型較大的輸出值對應正標籤(真實標籤為 1)，較小的輸出值對應負標籤(真實標籤為 0 )。這在損失函式中表現為如下形式：
在這裡插入圖片描述

第 4 步：對權重向量和偏置量，計算其對損失函式的梯度。
一般形式如下：
在這裡插入圖片描述

第 5 步：更新權重和偏置值。
在這裡插入圖片描述

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
np.random.seed(123) #seed( ) 用於指定隨機數生成時所用演算法開始的整數值，如果使用相同的seed( )值，則每次生成的隨即數都相同，如果不設定這個值，則系統根據時間來自己選擇這個值，此時每次生成的隨機數因時間差異而不同

# We will perform logistic regression using a simple toy dataset of two classes
X, y_true = make_blobs(n_samples= 1000, centers=2)#sklearn中的make_blobs函式主要是為了生成資料集的，n_samples是待生成的樣本總數，centers是類別數
#print(X.shape)  (1000, 2)

fig = plt.figure(figsize=(8,6))
plt.scatter(X[:,0], X[:,1], c=y_true)#x[:,0]是陣列所有行的第一列資料，x[:,1]是陣列所有行的第二列資料
plt.title("Dataset")
plt.xlabel("First feature")
plt.ylabel("Second feature")
plt.show()

# Reshape targets to get column vector with shape (n_samples, 1)
y_true = y_true[:, np.newaxis]#np.newaxis是增加維度的，相當於增加一列
# Split the data into a training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y_true)
print(f'Shape X_train: {X_train.shape}')
print(f'Shape y_train: {y_train.shape}')
print(f'Shape X_test: {X_test.shape}')
print(f'Shape y_test: {y_test.shape}')


class LogisticRegression:

    def __init__(self): # 初始化
        pass # 什麼也不做，只是防止語句出錯

    def sigmoid(self, a): # 啟用函式
        return 1 / (1 + np.exp(-a))

    def train(self, X, y_true, n_iters, learning_rate): # 訓練樣本，n_iters是迭代的次數
        """
        Trains the logistic regression model on given data X and targets y
        """
        # Step 0: Initialize the parameters
        n_samples, n_features = X.shape # shape返回的是維度，就是幾行幾列，是一個元組
        self.weights = np.zeros((n_features, 1)) # np.zeros返回來一個給定形狀和型別的用0填充的陣列，weights是指權重
        self.bias = 0 # bias 偏置賦值為0
        costs = [] # 損失陣列

        for i in range(n_iters): # 迭代
            # Step 1 and 2: Compute a linear combination of the input features and weights,
            # apply the sigmoid activation function
            y_predict = self.sigmoid(np.dot(X, self.weights) + self.bias) # np.dot矩陣運算或向量內積

            # Step 3: Compute the cost over the whole training set.
            cost = (- 1 / n_samples) * np.sum(y_true * np.log(y_predict) + (1 - y_true) * (np.log(1 - y_predict)))

            # Step 4: Compute the gradients
            dw = (1 / n_samples) * np.dot(X.T, (y_predict - y_true))
            db = (1 / n_samples) * np.sum(y_predict - y_true)

            # Step 5: Update the parameters
            self.weights = self.weights - learning_rate * dw
            self.bias = self.bias - learning_rate * db

            costs.append(cost) # 將每個損失值新增進損失陣列
            if i % 100 == 0: # 每過 100 次輸出一下損失
                print(f"Cost after iteration {i}: {cost}")

        return self.weights, self.bias, costs

    def predict(self, X):
        """
        Predicts binary labels for a set of examples X.
        """
        y_predict = self.sigmoid(np.dot(X, self.weights) + self.bias)
        y_predict_labels = [1 if elem > 0.5 else 0 for elem in y_predict]

        return np.array(y_predict_labels)[:, np.newaxis]

regressor = LogisticRegression() # 建立邏輯迴歸例項
w_trained, b_trained, costs = regressor.train(X_train, y_train, n_iters=600, learning_rate=0.009)

fig = plt.figure(figsize=(8,6))
plt.plot(np.arange(600), costs) # 繪製的是隨著訓練的進行，損失值的變化，plt.plot(x, y)：x為x軸資料, y為y軸資料，它可以繪製點和線, 並且對其樣式進行控制
plt.title("Development of cost over training")
plt.xlabel("Number of iterations")
plt.ylabel("Cost")
plt.show()

y_p_train = regressor.predict(X_train)
y_p_test = regressor.predict(X_test)

print(f"train accuracy: {100 - np.mean(np.abs(y_p_train - y_train)) * 100}%") # mean()求取均值
print(f"test accuracy: {100 - np.mean(np.abs(y_p_test - y_test))}%")

邏輯迴歸模型python程式碼加詳細註釋

邏輯迴歸模型python程式碼加詳細註釋

Python利用邏輯迴歸模型解決MNIST手寫數字識別問題詳解

Python邏輯迴歸模型應用舉例

堆排序演算法（加詳細註釋版）

機器學習演算法-樸素貝葉斯（二）：模擬離散資料集--貝葉斯分類（程式碼附詳細註釋）

C++通訊錄系統（含完整程式碼和詳細註釋）

PyTorch建立簡單的邏輯迴歸模型(LogisticRegression)

利用邏輯迴歸模型判斷使用者提問意圖

拓端tecdat：Python整合機器學習：用AdaBoost、決策樹、邏輯迴歸整合模型分類和迴歸和網格搜尋超引數優化

R語言混合效應邏輯迴歸（mixed effects logistic）模型分析肺癌資料|附程式碼資料

Python利用邏輯迴歸分類實現模板

python實現梯度下降和邏輯迴歸

python程式碼如何註釋

R語言邏輯迴歸和泊松迴歸模型對發生交通事故概率建模

Python程式碼註釋的用法和意義

Python 基礎語法一(註釋、行與縮排、多行語句、空行和程式碼組)

tensorflow2.0——程式碼實現一元邏輯迴歸

Python程式碼註釋規範程式碼例項解析

阿里天池 NLP 入門賽 TextCNN 方案程式碼詳細註釋和流程講解

Python實現201909-2（小明種蘋果（續））滿分程式碼，帶註釋

邏輯迴歸模型python程式碼加詳細註釋

相關推薦