線性模型之對數機率迴歸
阿新 • • 發佈:2019-07-07
線性模型之對數機率迴歸
廣義線性模型:\(y=g^{-1}(w^Tx+b)\)
- \(g^{-1}(x)\),單調可微函式
如果用線性模型完成分類任務如何做?
- 根據線性模型可知,找到一個單調可微函式將分類任務的真實標記\(y_i\)與線性模型的預測值聯絡起來即可。
廣義線性模型對樣本要求不必要服從正態分佈、只需要服從指數分佈簇(二項
分佈、泊松分佈、伯努利分佈、指數分佈等)即可;廣義線性模型的自變數可
以是連續的也可以是離散的.
logistic迴歸
logistic/sigmoid函式:
- \(p=h_\theta(x)=g(\theta^Tx+b)=\frac{1}{1+e^{-\theta^Tx+b}}\)
- \(ln\frac{y}{1-y} = \theta^Tx+b\)
- \(ln\frac{y}{1-y}\):對數機率,將預測的結果逼近真實標記的對數機率
- \(g^`(z)=g(z)(1-g(z))\)
將y視為類後驗概率估計\(h_\theta(x)=P(y=1|x)\),則:
- \(P(y=1|x;\theta)=(h_\theta(x))\)
- \(P(y=0|x;\theta)=1-(h_\theta(x))\)
- \(P(y|x;\theta)=(h_\theta(x))^y(1-h_\theta(x))^{1-y}\)
第一步:似然函式:
- \(L(\theta)=\prod_{i=1}^mp(y^{(i)}|x^{(i)};\theta)=\prod_{i=1}^m(h_\theta(x^{(i)}))^{y^{(i)}}(1-h_\theta(x^{(i)}))^{1-y^{(i)}}\)
第二步:取對數似然函式:
- \(l(\theta)=L(\theta)=\sum^m_{i=1}(y^{(i)}logh_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)})))\)
Logistic損失函式:\(-l(\theta)=\sum^m_{i=1}(-y^{(i)}ln(h_\theta(x^{(i)}))-(1-y^{(i)})ln((1-h_\theta(x^{(i)}))))\)$
第三步:對屬於j類別\(\theta\)求導:
- \(\frac{\partial l(\theta)}{\partial \theta_j} =\sum^m_{i=1}(\frac{y^{(i)}}{g(\theta^Tx^{(i)})}-\frac{1-y^{(i)}}{1-g(\theta^Tx^{(i)})})\cdot g(\theta^Tx^{(i)})(1-g(\theta^Tx^{(i)}))\cdot\frac{\partial \theta^Tx^{(i)}}{\partial \theta_j}\)
- =\(\sum^m_{i=1}(y^{(i)}(1-g(\theta^Tx^{(i)}))-(1-y^{(i)})g(\theta^Tx^{(i)})\cdot x^{(i)}_j\)
- =\(\sum^m_{i=1}(y^{(i)}-g(\theta^Tx^{(i)})\cdot x^{(i)}_j\)
第四步:梯度求解
- 批量梯度下降:
- for j=1 to n:
\(\theta_j=\theta_j +\alpha\sum^m_{i=1}(y^{(i)}-h_\theta(x^{(i)}))x_j^{(i)}\)
- for j=1 to n:
- 隨機梯度下降法(SGD)
- for j=1 to n:
\(\theta_j=\theta_j +\alpha(y^{(i)}-h_\theta(x^{(i)}))x_j^{(i)}\) - 與批量梯度下降法主要體現在權重不同
- for j=1 to n:
import numpy as np
# 假設空間函式:h(x)
def sigmoid (xArr):
xMat = np.mat(xArr)
return xMat.T * xMat
# 批量梯度下降法
# alpha:學習率 maxCycle:學習的迭代次數
def gradAscent (dataMatin,labels, alpha=0.1, maxCycle=100):
dataMatrix= np.mat(dataMatin)
labelsMatrix = np.mat(labels).T
m,n = np.shape(dataMatrix)
# 初始化權重
weights = np.ones((n,1))
for k in maxCycle:
# error, dataMatrix 為m*n的矩陣
error = labelsMatrix - sigmoid(dataMatrix *weights)
weights = weights + alpha * dataMatrix.T * error
return weight
# 隨機梯度下降法
# alpha:學習率
def gradAscent (dataMatin,labels, alpha=0.1):
dataMatrix= np.mat(dataMatin)
labelsMatrix = np.mat(labels).T
m,n = np.shape(dataMatrix)
# 初始化權重
weights = np.ones((n,1))
# m為樣本數
for i in range(m):
# error, dataMatrix 為m*n的矩陣
error = labelsMatrix[i] - sigmoid(dataMatrix[i] * weights)
weights = weights + alpha * error * dataMatrix[i]
return weights
softmax迴歸
- softmax迴歸是logistic迴歸的一般化,適用於K分類的問題,第k類的引數為向量\(θ_k\),組成的二維矩陣為\(θ_{k*n}\)
- softmax函式的本質就是將一個K維的任意實數向量壓縮(對映)成另一個K維的實數向量,其中向量中的每個元素取值都介於(0,1)之間。
- logistics迴歸概率函式:
- \(p(y=1|x;\theta)=\frac{1}{1+e^{-\theta^Tx}}\)
- softmax迴歸概率函式:
- \(p(y=k|x;\theta)=\frac{e^{\theta^T_kx}}{\sum_{j=1}^{k}e^{-\theta^T_jx}} \quad k=1,2.\dots,K\)
- softmax假設函式:
- softmax損失函式:
- \(J(\theta)=-\frac{1}{m}\sum^m_(i=1)\sum^k_(j=1)I(y^{(i)}=j)ln(\frac{e^{\theta^T_jx^{(i)}}}{\sum_{l=1}^{k}e^{-\theta^T_lx^{(i)}}})\)
- 解法同上:logistics迴歸的對數似然函式
- 函式\(I(y^{(i)}=j)\):
- \(if(y^{(i)}=j): \quad I(y^{(i)}=j)=1 \quad else \quad I(y^{(i)}=j)=0\)
- 存在的意思:使不是j類別的樣本損失為0,使似然函式最大化
- \(J(\theta)=-\frac{1}{m}\sum^m_(i=1)\sum^k_(j=1)I(y^{(i)}=j)ln(\frac{e^{\theta^T_jx^{(i)}}}{\sum_{l=1}^{k}e^{-\theta^T_lx^{(i)}}})\)
- 對第i個樣本的屬於j類別\(\theta\)分量求導:(\(0<i<m\),\(1<j<k\))
- \(\nabla_{\theta_j}J(\theta)=\nabla-I(y^{(i)}=j)ln(\frac{e^{\theta_j^Tx^{(i)}}}{\sum_{l=1}^Ke^{\theta_l^Tx^{(i)}}})\)
- \(ln(\frac{e^{\theta_j^Tx^{(i)}}}{\sum_{l=1}^Ke^{\theta_l^Tx^{(i)}}}) = \theta_j^Tx^{(i)}-ln(\sum_{l=1}^Ke^{\theta_l^Tx^{(i)})}\)
- \(\nabla_{\theta_j}J(\theta)=-I(y^{(i)}=j)(1-\frac{e^{\theta^T_jx^{(i)}}}{\sum_{l=1}^{k}e^{-\theta^T_lx^{(i)}}})x^{(i)}\)
- 第j類別\(\theta\)更新:
- 批量梯度下降
- \(\theta_j=\theta_j+\alpha \sum_{i=1}^{m}I(y^{(i)}=j)(1-p(y^{(i)}=j|x^{(i)};\theta))x^{(i)}\)
- 隨機梯度下降
- \(\theta_j=\theta_j+\alpha I(y^{(i)}=j)(1-p(y^{(i)}=j|x^{(i)};\theta))x^{(i)}\)
- 批量梯度下降