Coursera 學習筆記|Machine Learning by Standford University - 吳恩達
/ 20220404 Week 1 - 2 /
Chapter 1 - Introduction
1.1 Definition
-
Arthur Samuel
The field of study that gives computers the ability to learn without being explicitly programmed. -
Tom Mitchell
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P
1.2 Concepts
1.2.1 Classification of Machine Learning
-
Supervised Learning 監督學習:given a labeled data set; already know what a correct output/result should look like
- Regression 迴歸:continuous output
- Classification 分類:discrete output
-
Unsupervised Learning 無監督學習:given an unlabeled data set or an data set with the same labels; group the data by ourselves
- Clustering 聚類:group the data into different clusters
- Non-Clustering 非聚類
- Others: Reinforcement Learning, Recommender Systems...
1.2.2 Model Representation
-
Training Set 訓練集
-
符號說明
\(m=\) the number of training examples 訓練樣本的數量 - 行數
\(n=\) the number of features 特徵數量 - 列數
\(x=\) input variable/feature 輸入變數/特徵
\(y=\) output variable/target variable 輸出變數/目標變數
\((x^{(i)}_j,y^{(i)})\) :第\(j\)個特徵的第 \(i\) 個訓練樣本,其中 \(i=1, ..., m\),\(j=1, ..., n\)
1.2.3 Cost Function 代價函式
1.2.4 Gradient Descent 梯度下降
Chapter 2 - Linear Regression 線性迴歸
\[\begin{matrix} x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\ x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\ \vdots&\vdots&\vdots&\ddots&\vdots&&\vdots\\ x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)}\\ \\ \theta_0&\theta_1&\theta_2&\cdots&\theta_n&& \end{matrix}\]2.1 Linear Regression with One Variable 單元線性迴歸
-
Hypothesis Function
\[h_{\theta}(x)=\theta_0+\theta_1x \] -
Cost Function - Square Error Cost Function 平方誤差代價函式
-
Goal
\[\min_{(\theta_0,\theta_1)}J(\theta_0,\theta_1) \]
2.2 Multivariate Linear Regression 多元線性迴歸
-
Hypothesis Function
\[\theta= \left[ \begin{matrix} \theta_0\\ \theta_1\\ \vdots\\ \theta_n \end{matrix} \right],\ x= \left[ \begin{matrix} x_0\\ x_1\\ \vdots\\ x_n \end{matrix} \right]\] \[\begin{aligned}h_\theta(x)&=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n\\ &=\theta^Tx \end{aligned}\] -
Cost Function
\[J(\theta^T)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2 \] -
Goal
\[\min_{\theta^T}J(\theta^T) \]
2.3 Algorithm Optimization
2.3.1 Gradient Descent 梯度下降法
- 演算法過程
Repeat until convergence(simultaneous update for each \(j=1, ..., n\))
-
Feature Scaling 特徵縮放
對每個特徵 \(x_j\) 有$$x_j={{x_j-\mu_j}\over{s_j}}$$
其中 \(\mu_j\) 為 \(m\) 個特徵 \(x_j\) 的平均值,\(s_j\) 為 \(m\) 個特徵 \(x_j\) 的範圍(最大值與最小值之差)或標準差。 - Learning Rate 學習率
2.3.2 Normal Equation(s) 正規方程(組)
令
\[X=\left[ \begin{matrix} x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n\\ x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n\\ \end{matrix} \right],\ y=\left[ \begin{matrix} y^{(1)}\\ y^{(2)}\\ \vdots\\ y^{(m)}\\ \end{matrix} \right]\]其中 \(X\) 為 \(m\times(n+1)\) 維矩陣,\(y\) 為 \(m\) 維的列向量。則
\[\theta=(X^TX)^{-1}X^Ty \]如果 \(X^TX\) 不可逆(noninvertible),可能是因為:
- Redundant features 冗餘特徵:存線上性相關的兩個特徵,需要刪除其中一個;
- 特徵過多,如 \(m\leq n\):需要刪除一些特徵,或對其進行正規化(regularization)處理。
2.4 Polynomial Regression 多項式迴歸
If a linear \(h_\theta(x)\) can't fit the data well, we can change the behavior or curve of \(h_\theta(x)\) by making it a quadratic, cubic or square root function(or any other form).
e.g.
-
\(h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2,\ x_2=x_1^2\)
-
\(h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2+\theta_3x_1^3,\ x_2=x_1^2,\ x_3=x_1^3\)
-
\(h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2\sqrt{x_1},\ x_2=\sqrt{x_1}\)