Machine Learning--week4 神經網路的基本概念

阿新 • • 發佈：2018-12-30

之前的學習成果並不能解決複雜的非線性問題

Neural Networks

Sigmoid(logistic) activation function: activation function is another term for \(g(z) = \frac{1}{1+e^{-z}}\)

activation: the value that's computed by and as output by a specific

weights = parameters = \(\theta\)

input units: \(x_1,x_2, x_3,\dots, x_n\)

bias unit/ bias neuron:

\(x_0\) 與 \(a_0^{(j)}\)

input units 和 hypothesis 之間的layer 由activation 構成

input wire/ output wire：input wire是指指向目標neuron的箭頭，output wire是指從目標neuron指出的箭頭

\(a_i^{(j)}\): "activation" of neuron \(i\) or of unit \(i\) in layer \(j\)

\(\Theta^{(j)}\): matrix of weights controlling the function mapping form layer \(j\)

to layer \(j+1\)

（注意\(\Theta\)是大寫的，因為它需要用到矩陣的形式了）

layer 1 == input layer

layer n == output layer (the last layer)

layer 2 ~ layer n-1 == hidden layer

for example:
\[ \begin{align} \text{output of layer 1(a hidden lyer)}&\begin{cases}a_1^{(2)} &= g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3)\\ a_2^{(2)} &= g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3)\\ a_3^{(2)} &= g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3)\end{cases}\\ \text{output layer}&\begin{cases}h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} +\Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)})\end{cases} \end{align} \]

直觀點就是：
\[ \begin{align} \text{output of layer 1(a hidden lyer)} &\begin{cases} a_1^{(2)} &= g(\Theta_{1}^{(1)}a^{(1)})\\ a_2^{(2)} &= g(\Theta_{2}^{(1)}a^{(1)})\\ a_3^{(2)} &= g(\Theta_{3}^{(1)}a^{(1)}) \end{cases}\\ \text{output layer} &\begin{cases} h_\Theta(x) = a_1^{(3)} = g(\Theta_{1}^{(2)}a^{(2)}) \end{cases} \end{align} \]

)generally, \(\Theta^{(j)}\) will be of dimension \(s_{j+1} \times (s_j+1)\), if network has \(s_j\) units in layer \(j\) and \(s_{j+1}\) units in layer \(j+1\). (\(s_j+1\)中的\(+1\) comes from the addition in \(\Theta^{(j)}\) of the "bias nodes," \(x_0\) and \(\Theta_0^{(j)}\) . In other words the output nodes will not include the bias nodes while the inputs will. )

定義 \(a^{(1)} = x\)

\(z^{j+1} = \Theta^{(j)}a^{(j)}\)

\(x_k^{(j+1)} = \Theta_{k,0}^{(j)}a_0^{(j)} + \Theta_{k,1}^{(j)}a_1^{(j)} + \dots + \Theta_{k,n^{(j)}}^{(j)}a_{n^{(j)}}^{(j)}\quad ,(n^{(j)} \text{ means layer j has } n^{(j)} \text{ activation})\)

\(a^{(j)} = g(z^{(j)}) = g(\Theta^{(j-1)}a^{(j-1)})\quad(j\ge2)\)

設有 \(n\) 個 layers, then the last matrix \(\Theta^{(n)}\) will have only one row which is multiplied by one column \(a^{(j)}\) so that our result is a single number:

\(h_\Theta(x) = a^{(n+1)}=g(z^{(n+1)})\)

Add \(a_0^{(j)}=1\)

Forward Propagation：向前傳播

Neural Networks 實際上是使用\(a^{(n-1)}\)layer作為訓練logistic regression的特徵的，而非input layer，在\(\Theta^{(1)}\)中選擇不同的引數可能得到一些複雜的特徵，從而的到更好的hypothesis，這樣做比直接用\(x_1,x_2,\dots ,x_n\)作為訓練特徵更好

architecture(架構)：the way that neural networks are connected

邏輯表示式對應的\(\theta\)：

\({\rm AND} = (x_1 \bigwedge x_2)\):
- \(\Theta = \begin{bmatrix}-30 &20& 20 \end{bmatrix}\)
\({\rm NOR} = (\lnot x_1 \bigwedge \lnot x_2)\):
- \(\Theta = \begin{bmatrix}10 & -20& -20 \end{bmatrix}\)
\({\rm OR} = (x_1 \bigvee x_2)\):
- \(\Theta = \begin{bmatrix}-10 &20& 20 \end{bmatrix}\)
\({\rm NOT} = (\lnot x)\):
- \(\Theta = \begin{bmatrix}-10 & 20\end{bmatrix}\)
\({\rm XNOR} = (\lnot x_1 \bigwedge \lnot x_2) \bigvee ( x_1 \bigwedge x_2)\)
- 需要一個hidden layer: \(a_1^{(2)} == (\lnot x_1 \bigwedge \lnot x_2),\quad a_2^{(2)} == (x_1 \bigwedge x_2)\)
- output layer: \(a^{(3)} == (a_1^{(2)} \bigvee a_2^{(2)})\)

邏輯表示式的實現：

令\(x=\begin{bmatrix}1 \\ x_1\\x_2 \end{bmatrix}\), 則 \(a_i = g(\Theta_ix)\)就得到\(\Theta_i\)對應的邏輯運算子運算\(x_1,x_2\)的結果了

比如 \(\Theta_i = \begin{bmatrix}-10 &20& 20 \end{bmatrix}\)那麼\(a_i == x_1 \bigvee x_2\)

像\({\rm XNOR}\)這種複雜的邏輯表示式需要藉助hidden layer才能算出來

對於 multiclass Classification:

用\(y = \begin{bmatrix}1\\0\\0\\0 \end{bmatrix}, \begin{bmatrix}0\\1\\0\\0 \end{bmatrix}, \begin{bmatrix}0\\0\\1\\0 \end{bmatrix}, \begin{bmatrix}0\\0\\0\\1 \end{bmatrix},\begin{bmatrix}0\\0\\0\\0 \end{bmatrix}\)來表示不同的class，

Machine Learning--week4 神經網路的基本概念

Machine Learning--week4 神經網路的基本概念

網路基本概念之TCP, UDP, 單播（Unicast）, 多播（組播）（Multicast）

卷積神經網路基礎概念

非區域性神經網路，打造未來神經網路基本元件

Tensorflow 搭建神經網路基本流程

神經網路常見概念總結

tensorflow 神經網路基本使用

計算機網路基本概念

BP神經網路基本介紹

RNN 迴圈 NN 神經網路基本結構型別

網路基本概念之TCP, UDP, 單播（Unicast）, 組播（Multicast）

TensorFlow學習筆記（九）tf搭建神經網路基本流程

卷積神經網路模型概念與理解

吳恩達深度學習課程筆記之卷積神經網路基本操作詳解

機器學習與深度學習系列連載：第二部分深度學習（十三）迴圈神經網路 1（Recurre Neural Network 基本概念）

Machine Learning筆記整理 ------ （一）基本概念

訓練神經網路中最基本的三個概念和區別：Epoch, Batch, Iteration

第五週（反向神經網路）-【機器學習-Coursera Machine Learning-吳恩達】

AI應用開發基礎傻瓜書系列2-神經網路中反向傳播與梯度下降的基本概念

卷積神經網路CNN（一）基本概念、卷積

Machine Learning--week4 神經網路的基本概念

相關推薦