Coursera 學習筆記｜Machine Learning by Standford University - 吳恩達

阿新 • • 發佈：2022-04-04

/ 20220404 Week 1 - 2 /

Chapter 1 - Introduction

1.1 Definition

Arthur Samuel
The field of study that gives computers the ability to learn without being explicitly programmed.
Tom Mitchell
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P

, if its performance at tasks in T, as measured by P, improves with experience E.

1.2 Concepts

1.2.1 Classification of Machine Learning

Supervised Learning 監督學習：given a labeled data set; already know what a correct output/result should look like
- Regression 迴歸：continuous output
- Classification 分類：discrete output
Unsupervised Learning 無監督學習：given an unlabeled data set or an data set with the same labels; group the data by ourselves
- Clustering 聚類：group the data into different clusters
- Non-Clustering 非聚類
Others: Reinforcement Learning, Recommender Systems...

1.2.2 Model Representation

Training Set 訓練集

\[\begin{matrix} x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\ x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\ \vdots&\vdots&\ddots&\vdots&&\vdots\\ x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)} \end{matrix}\]
符號說明
$m=$ the number of training examples 訓練樣本的數量 - 行數
$n=$ the number of features 特徵數量 - 列數
$x=$ input variable/feature 輸入變數/特徵
$y=$ output variable/target variable 輸出變數/目標變數
$(x^{(i)}_j,y^{(i)})$ ：第$j$個特徵的第 $i$ 個訓練樣本，其中 $i=1, ..., m$，$j=1, ..., n$

1.2.3 Cost Function 代價函式

1.2.4 Gradient Descent 梯度下降

Chapter 2 - Linear Regression 線性迴歸

\[\begin{matrix} x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n&&y^{(1)}\\ x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n&&y^{(2)}\\ \vdots&\vdots&\vdots&\ddots&\vdots&&\vdots\\ x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n&&y^{(m)}\\ \\ \theta_0&\theta_1&\theta_2&\cdots&\theta_n&& \end{matrix}\]

2.1 Linear Regression with One Variable 單元線性迴歸

Hypothesis Function
\[h_{\theta}(x)=\theta_0+\theta_1x \]
Cost Function - Square Error Cost Function 平方誤差代價函式

\[J(\theta_0,\theta_1)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2 \]

Goal
\[\min_{(\theta_0,\theta_1)}J(\theta_0,\theta_1) \]

2.2 Multivariate Linear Regression 多元線性迴歸

Hypothesis Function
\[\theta= \left[ \begin{matrix} \theta_0\\ \theta_1\\ \vdots\\ \theta_n \end{matrix} \right],\ x= \left[ \begin{matrix} x_0\\ x_1\\ \vdots\\ x_n \end{matrix} \right]\] \[\begin{aligned}h_\theta(x)&=\theta_0+\theta_1x_1+\theta_2x_2+\cdots+\theta_nx_n\\ &=\theta^Tx \end{aligned}\]
Cost Function
\[J(\theta^T)=\frac{1}{2m}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2 \]
Goal
\[\min_{\theta^T}J(\theta^T) \]

2.3 Algorithm Optimization

2.3.1 Gradient Descent 梯度下降法

演算法過程
Repeat until convergence(simultaneous update for each $j=1, ..., n$)

\[\begin{aligned} \theta_j &:=\theta_j-\alpha{\partial\over\partial\theta_j}J(\theta^T)\\ &:=\theta_j-\alpha{1\over{m}}\displaystyle\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j \end{aligned}\]

Feature Scaling 特徵縮放
對每個特徵 $x_j$ 有$$x_j={{x_j-\mu_j}\over{s_j}}$$
其中 $\mu_j$ 為 $m$ 個特徵 $x_j$ 的平均值，$s_j$ 為 $m$ 個特徵 $x_j$ 的範圍（最大值與最小值之差）或標準差。
Learning Rate 學習率

2.3.2 Normal Equation(s) 正規方程（組）

令

\[X=\left[ \begin{matrix} x_0&x^{(1)}_1&x^{(1)}_2&\cdots&x^{(1)}_n\\ x_0&x^{(2)}_1&x^{(2)}_2&\cdots&x^{(2)}_n\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ x_0&x^{(m)}_1&x^{(m)}_2&\cdots&x^{(m)}_n\\ \end{matrix} \right],\ y=\left[ \begin{matrix} y^{(1)}\\ y^{(2)}\\ \vdots\\ y^{(m)}\\ \end{matrix} \right]\]

其中 $X$ 為 $m\times(n+1)$ 維矩陣，$y$ 為 $m$ 維的列向量。則

\[\theta=(X^TX)^{-1}X^Ty \]

如果 $X^TX$ 不可逆（noninvertible），可能是因為：

Redundant features 冗餘特徵：存線上性相關的兩個特徵，需要刪除其中一個；
特徵過多，如 $m\leq n$：需要刪除一些特徵，或對其進行正規化（regularization）處理。

2.4 Polynomial Regression 多項式迴歸

If a linear $h_\theta(x)$ can't fit the data well, we can change the behavior or curve of $h_\theta(x)$ by making it a quadratic, cubic or square root function(or any other form).
e.g.

$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2,\ x_2=x_1^2$
$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_1^2+\theta_3x_1^3,\ x_2=x_1^2,\ x_3=x_1^3$
$h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2\sqrt{x_1},\ x_2=\sqrt{x_1}$

Coursera 學習筆記｜Machine Learning by Standford University - 吳恩達

Chapter 1 - Introduction

1.1 Definition

1.2 Concepts

1.2.1 Classification of Machine Learning

1.2.2 Model Representation

1.2.3 Cost Function 代價函式

1.2.4 Gradient Descent 梯度下降

Chapter 2 - Linear Regression 線性迴歸

2.1 Linear Regression with One Variable 單元線性迴歸

2.2 Multivariate Linear Regression 多元線性迴歸

2.3 Algorithm Optimization

2.3.1 Gradient Descent 梯度下降法

2.3.2 Normal Equation(s) 正規方程（組）

2.4 Polynomial Regression 多項式迴歸

Coursera 學習筆記｜Machine Learning by Standford University - 吳恩達

機器學習筆記19(unspervised learning -> Word Embedding)

吳恩達深度學習筆記（deeplearning.ai）之卷積神經網路（CNN）（上）

【吳恩達機器學習筆記】梯度下降演算法

吳恩達機器學習---自己的筆記（Day1-6）

88道Java學習筆記題，百事可樂，萬事芬達，心情雪碧，一週七喜，加油！

吳恩達機器學習筆記1 初識機器學習

《吳恩達機器學習》學習筆記003_邏輯迴歸、正則化

《吳恩達機器學習》學習筆記008_聚類(Clustering)

吳恩達深度學習課件_吳恩達深度學習筆記02.改善深層神經網路 W3.超引數除錯、Batch Norm和程式框架...

吳恩達深度學習筆記-2（程式設計基礎）

吳恩達深度學習筆記-5（改良網路）

吳恩達深度學習筆記-7（超參除錯）

吳恩達-機器學習筆記-第一章

吳恩達機器學習筆記--ex1(Python實現)

吳恩達機器學習筆記——分類（二）

深度學習教程 | 吳恩達專項課程 · 全套筆記解讀

【吳恩達深度學習】L1W1 學習筆記

機器學習-吳恩達

吳恩達深度學習課後作業第一課第二週-邏輯迴歸的拓展，自己做資料來進行預測是否是貓

Coursera 學習筆記｜Machine Learning by Standford University - 吳恩達

Chapter 1 - Introduction

1.1 Definition

1.2 Concepts

1.2.1 Classification of Machine Learning

1.2.2 Model Representation

1.2.3 Cost Function 代價函式

1.2.4 Gradient Descent 梯度下降

Chapter 2 - Linear Regression 線性迴歸

2.1 Linear Regression with One Variable 單元線性迴歸

2.2 Multivariate Linear Regression 多元線性迴歸

2.3 Algorithm Optimization

2.3.1 Gradient Descent 梯度下降法

2.3.2 Normal Equation(s) 正規方程（組）

2.4 Polynomial Regression 多項式迴歸

相關推薦