【吳恩達】正則化

阿新 • • 發佈：2022-03-17

Chapter7 Regularization

The problem of overfitting

If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.

addressing overfitting

Reduce number of features

Manually select which features to keep.
Model selection algorithm.

Regularization

Keep all the features, but reduce magnitude/values of parameters \(\theta_j\), which can make simpler hypothesis, smoother functions.

Cost function

\[J(\theta)=\frac{1}{2m} [\sum_1^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum^n_{j=1}\theta_j^2] \]

\(\lambda\):regularization parameter, control a trade off between two different goals. The goal of fitting the training set well and the goal of keeping the parameter small to avoid overfitting.

We don't need to shrink \(\theta_0\), because \(\theta_0\) correspond to the constant term, which makes little influence to the overfitting.

Linear regression

Gradient descent

repeat until convergence{

\[\begin{aligned} \theta_0&=\theta_0-\alpha\frac{\partial}{\partial\theta_0}J(\theta)\\ &=\theta_j-\frac{\alpha}{m} \sum_1^m[h_\theta(x^{(i)})-y^{(i)}]x_0^{(i)}\\ \theta_j&=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta)\\ &=\theta_j-\alpha[\frac{1}{m} \sum_1^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j]\\ &=\theta_j(1-\alpha\frac{\lambda}{m})-\frac{\alpha}{m} \sum_1^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} \end{aligned} \]

( \(j=1,\cdots,n\)

)

}

\(1-\alpha\frac{\lambda}{m}<1\), but very close to \(1\). Because \(\alpha\) is small and \(m\) is large. multiplying this term means reducing the influence of \(\theta_j\).

Normal equation

\[\theta=(X^TX+\lambda \left[ \begin{matrix} 0&&&&\\ &1&&&\\ &&1&&\\ &&&\ddots&\\ &&&&1 \end{matrix} \right]_{(n+1)\times(n+1)} )^{-1}X^Ty \]

if \(\lambda>0\), we can prove that the matrix is invertible.

Logistic regression

\[\begin{aligned} J(\theta)=-\frac{1}{m}\sum_{i=1}^m[y^{(i)}\log(h_\theta(x^{(i)}))+(1-y^{(i)})\log(1-h_\theta(x^{(i)}))]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2 \end{aligned} \]

Gradient descent

repeat until convergence{

( \(j=1,\cdots,n\))

}

Words and expressions

ameliorate 改良

wiggly 彎曲的

【吳恩達】正則化

Chapter7 Regularization

The problem of overfitting

addressing overfitting

Cost function

Linear regression

Gradient descent

Normal equation

Logistic regression

Gradient descent

Words and expressions

【吳恩達】正則化

【深度學習pytorch】正則化

【吳恩達機器學習筆記】梯度下降演算法

【吳恩達Tensorflow 2.0實踐課】2.2 Transfer learning

【吳恩達深度學習】L1W1 學習筆記

《吳恩達機器學習》學習筆記003_邏輯迴歸、正則化

什麼是機器學習迴歸演算法？【線性迴歸、正規方程、梯度下降、正則化、欠擬合和過擬合、嶺迴歸】

【學術】將吳恩達的第一個深度神經網路應用於泰坦尼克生存資料集

【筆記】正則表示式·記②

【筆記】正則表示式·記③--分組與引用

在PyTorch中使用標籤平滑正則化的問題

TensorFlow keras卷積神經網路新增L2正則化方式

tensorflow使用L2 regularization正則化修正overfitting過擬合方式

曠世提出類別正則化的域自適應目標檢測模型，緩解場景多樣的痛點 | CVPR 2020

學習筆記163—理解模型正則化：L1正則、L2正則（理論+程式碼）

L1和L2正則化

特徵歸一化、特徵對映、正則化

深刻理解正則化力度與權重的關係圖Ridge coefficients as a function of the regularization

吳恩達深度學習筆記（deeplearning.ai）之卷積神經網路（CNN）（上）

吳恩達機器學習---自己的筆記（Day1-6）

【吳恩達】正則化

Chapter7 Regularization

The problem of overfitting

addressing overfitting

Cost function

Linear regression

Gradient descent

Normal equation

Logistic regression

Gradient descent

Words and expressions

相關推薦