1. 程式人生 > >Machine Learning--week4 神經網路的基本概念

Machine Learning--week4 神經網路的基本概念

之前的學習成果並不能解決複雜的非線性問題

Neural Networks

Sigmoid(logistic) activation function: activation function is another term for \(g(z) = \frac{1}{1+e^{-z}}\)

activation: the value that's computed by and as output by a specific

weights = parameters = \(\theta\)

input units: \(x_1,x_2, x_3,\dots, x_n\)

bias unit/ bias neuron:

\(x_0\)\(a_0^{(j)}\)

input units 和 hypothesis 之間的layer 由activation 構成

input wire/ output wire:input wire是指指向目標neuron的箭頭,output wire是指從目標neuron指出的箭頭

\(a_i^{(j)}\): "activation" of neuron \(i\) or of unit \(i\) in layer \(j\)

\(\Theta^{(j)}\): matrix of weights controlling the function mapping form layer \(j\)

to layer \(j+1\)

(注意\(\Theta\)是大寫的,因為它需要用到矩陣的形式了)

layer 1 == input layer

layer n == output layer (the last layer)

layer 2 ~ layer n-1 == hidden layer

for example:
\[ \begin{align} \text{output of layer 1(a hidden lyer)}&\begin{cases}a_1^{(2)} &= g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3)\\ a_2^{(2)} &= g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3)\\ a_3^{(2)} &= g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3)\end{cases}\\ \text{output layer}&\begin{cases}h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} +\Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)})\end{cases} \end{align} \]

直觀點就是:
\[ \begin{align} \text{output of layer 1(a hidden lyer)} &\begin{cases} a_1^{(2)} &= g(\Theta_{1}^{(1)}a^{(1)})\\ a_2^{(2)} &= g(\Theta_{2}^{(1)}a^{(1)})\\ a_3^{(2)} &= g(\Theta_{3}^{(1)}a^{(1)}) \end{cases}\\ \text{output layer} &\begin{cases} h_\Theta(x) = a_1^{(3)} = g(\Theta_{1}^{(2)}a^{(2)}) \end{cases} \end{align} \]

)generally, \(\Theta^{(j)}\) will be of dimension \(s_{j+1} \times (s_j+1)\), if network has \(s_j\) units in layer \(j\) and \(s_{j+1}\) units in layer \(j+1\). (\(s_j+1\)中的\(+1\) comes from the addition in \(\Theta^{(j)}\) of the "bias nodes," \(x_0\) and \(\Theta_0^{(j)}\) . In other words the output nodes will not include the bias nodes while the inputs will. )

定義 \(a^{(1)} = x\)

\(z^{j+1} = \Theta^{(j)}a^{(j)}\)

\(x_k^{(j+1)} = \Theta_{k,0}^{(j)}a_0^{(j)} + \Theta_{k,1}^{(j)}a_1^{(j)} + \dots + \Theta_{k,n^{(j)}}^{(j)}a_{n^{(j)}}^{(j)}\quad ,(n^{(j)} \text{ means layer j has } n^{(j)} \text{ activation})\)

\(a^{(j)} = g(z^{(j)}) = g(\Theta^{(j-1)}a^{(j-1)})\quad(j\ge2)\)

設有 \(n\) 個 layers, then the last matrix \(\Theta^{(n)}\) will have only one row which is multiplied by one column \(a^{(j)}\) so that our result is a single number:

\(h_\Theta(x) = a^{(n+1)}=g(z^{(n+1)})\)

Add \(a_0^{(j)}=1\)

Forward Propagation:向前傳播

Neural Networks 實際上是使用\(a^{(n-1)}\)layer作為訓練logistic regression的特徵的,而非input layer,在\(\Theta^{(1)}\)中選擇不同的引數可能得到一些複雜的特徵,從而的到更好的hypothesis,這樣做比直接用\(x_1,x_2,\dots ,x_n\)作為訓練特徵更好

architecture(架構):the way that neural networks are connected

邏輯表示式對應的\(\theta\)

  • \({\rm AND} = (x_1 \bigwedge x_2)\):
    • \(\Theta = \begin{bmatrix}-30 &20& 20 \end{bmatrix}\)
  • \({\rm NOR} = (\lnot x_1 \bigwedge \lnot x_2)\):
    • \(\Theta = \begin{bmatrix}10 & -20& -20 \end{bmatrix}\)
  • \({\rm OR} = (x_1 \bigvee x_2)\):
    • \(\Theta = \begin{bmatrix}-10 &20& 20 \end{bmatrix}\)
  • \({\rm NOT} = (\lnot x)\):
    • \(\Theta = \begin{bmatrix}-10 & 20\end{bmatrix}\)
  • \({\rm XNOR} = (\lnot x_1 \bigwedge \lnot x_2) \bigvee ( x_1 \bigwedge x_2)\)
    • 需要一個hidden layer: \(a_1^{(2)} == (\lnot x_1 \bigwedge \lnot x_2),\quad a_2^{(2)} == (x_1 \bigwedge x_2)\)
    • output layer: \(a^{(3)} == (a_1^{(2)} \bigvee a_2^{(2)})\)

邏輯表示式的實現:

​ 令\(x=\begin{bmatrix}1 \\ x_1\\x_2 \end{bmatrix}\), 則 \(a_i = g(\Theta_ix)\)就得到\(\Theta_i\)對應的邏輯運算子運算\(x_1,x_2\)的結果了

​ 比如 \(\Theta_i = \begin{bmatrix}-10 &20& 20 \end{bmatrix}\)那麼\(a_i == x_1 \bigvee x_2\)

​ 像\({\rm XNOR}\)這種複雜的邏輯表示式需要藉助hidden layer才能算出來

對於 multiclass Classification:

​ 用\(y = \begin{bmatrix}1\\0\\0\\0 \end{bmatrix}, \begin{bmatrix}0\\1\\0\\0 \end{bmatrix}, \begin{bmatrix}0\\0\\1\\0 \end{bmatrix}, \begin{bmatrix}0\\0\\0\\1 \end{bmatrix},\begin{bmatrix}0\\0\\0\\0 \end{bmatrix}\)來表示不同的class,