機器學習- RNN以及LSTM的原理分析

阿新 • • 發佈：2020-03-02

概述

RNN是遞迴神經網路，它提供了一種解決深度學習的另一個思路，那就是每一步的輸出不僅僅跟當前這一步的輸入有關，而且還跟前面和後面的輸入輸出有關，尤其是在一些NLP的應用中，經常會用到，例如在NLP中，每一個輸出的Word，都跟整個句子的內容都有關係，而不僅僅跟某一個詞有關。LSTM是RNN的一種升級版本，它的核心思想跟RNN是一樣的，但是它透過一下方法避免了一些RNN的缺點。那麼下面就逐步的解析一下RNN和LSTM的結構，然後分析一下它們的原理吧。

RNN解析

要理解RNN，咱們得先來看一下RNN的結構，然後就來解釋一下它的原理

上圖中左邊的圖是一個RNN網路結構中總體的圖，右邊的圖片是一個RNN Cell裡面的具體細節；從上面的左邊的圖咱們可以看出，一個RNN的網路結構中，無論RNNcell迴圈多少次，它的weights都是share的，即一個weights只有一份copy，而每一步的Hidden state(即右圖中的a^<t>

和a^<t-1>)是不同的，每一個time step它都有一份hidden state的copy。從上面的圖片分析來看， RNN的每一步的輸入不單單是有X^<t>, 而且還有有前面的time step中學習來的hidden state - a^<t-1>。這就實現了咱們前面的需求了，讓每一步的輸出不僅僅跟當前的輸入有關，還得跟前面的輸入有關。具體的程式碼實現這個RNN cell的方法，你們可以參考下面的程式碼來加深對RNN的理解，實際在TensorFlow中來應用RNN的話，其實非常簡單，就是一句程式碼就搞定了，但是，這裡我還是貼出RNN cell建立的原始碼方便大家理解

def rnn_cell(xt, a_prev, parameters):
    """
    Implements a single forward step of the RNN-cell as described
    Arguments:
    xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        ba --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    Returns:
    a_next -- next hidden state, of shape (n_a, m)
    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    cache -- tuple of values needed for the backward pass, contains (a_next, a_prev, xt, parameters)
    """
    
    # Retrieve parameters from "parameters"
    Wax = parameters["Wax"]
    Waa = parameters["Waa"]
    Wya = parameters["Wya"]
    ba = parameters["ba"]
    by = parameters["by"]

    # compute next activation state using the formula given above
    a_next = np.tanh(Wax.dot(xt)+Waa.dot(a_prev)+ba)
    # compute output of the current cell using the formula given above
    yt_pred = softmax(Wya.dot(a_next)+by)   
    
    # store values you need for backward propagation in cache
    cache = (a_next, a_prev, xt, parameters)
    
    return a_next, yt_pred, cache

np.random.seed(1)
xt_tmp = np.random.randn(3,10)
a_prev_tmp = np.random.randn(5,10)
parameters_tmp = {}
parameters_tmp['Waa'] = np.random.randn(5,5)
parameters_tmp['Wax'] = np.random.randn(5,3)
parameters_tmp['Wya'] = np.random.randn(2,5)
parameters_tmp['ba'] = np.random.randn(5,1)
parameters_tmp['by'] = np.random.randn(2,1)

a_next_tmp, yt_pred_tmp, cache_tmp = rnn_cell_forward(xt_tmp, a_prev_tmp, parameters_tmp)
print("a_next[4] = ", a_next_tmp[4])
print("a_next.shape = ", a_next_tmp.shape)
print("yt_pred[1] =", yt_pred_tmp[1])
print("yt_pred.shape = ", yt_pred_tmp.shape)
print( a_next_tmp[:,:])
print( a_next_tmp[:,0])

LSTM解析

根據上面的RNN的結構圖片，你們仔細的看一下有沒有什麼缺點。如果怎麼的RNN需要迴圈很多次的話，咱們可能會有丟失資訊的可能，就是gradient vanishing的情況發生，如果gradient vanishing的情況發生的話，它就不會繼續學習咱們的資訊了，就變成了standard neuro network了，RNN就失去了意義了。而且，咱們的sequence越長（即：迴圈的次數越多），gradient vanishing的可能性就越大。這個時候，咱們就有必要優化咱們的RNN了，讓優化了的結構不但能夠不斷的學習能力，還能夠有記憶功能，能把咱們學習的主要的東西能夠記住，這就讓咱們的RNN進化到了LSTM（Long short term memory）階段了。為了能夠更好的解釋LSTM的網路結構，咱們還是先來看一些它就結構圖片，然後再來解釋一下吧

上圖是一個LSTM cell的基本結構，這個圖有有些不重要的元素，我都省略了，主要留下了一些最重要的資訊。首先對比RNN，咱們可以看出咱們多了3個gate和一個memory cell - C^<t>。這個memory cell也稱作internal hidden state。那麼咱們來看看這個三個gate到底是幹什麼的。第一個gate是forget gate，它是幫助咱們的memory cell刪除（或者稱之為過濾）掉一些不重要的資訊的，這個gate的值是在[0,1]這個區間，一般咱們用sigmoid函式來運算了，然後再和C來做element-wise的乘法運算，如果forget gate中的值趨向於0就刪除掉相應的資訊，如果forget gate中的值趨向於1則保留相應的值。第二個gate是update gate；這個gate要跟咱們candidate memory cell來共同作用來產生新的資訊，它們兩個進行element-wise的相乘運算後，再跟咱們經過forget gate後的memory cell來進行element-wise的相加勻速，相當於把咱們從當前time step中學習到的資訊新增到咱們的memory cell中。第三個gate就是output gate了；顧名思義就是過濾咱們輸出的hidden state的, 這個gate也是sigmoid函式，根據前一個hidden state a^<t-1>和當前time step的輸入 X^<t>來共同決定的，它跟經過forget gate和update gate處理後的memory cell 的tanh值進行element-wise相乘過後，得到了了咱們當前time step的hidden state-a<t>，同時還得到了咱們當前time step的memory cell值；從這裡咱們也可以看出output hidden state-a<t>和internal hidden state(memory cell）的dimension是一樣的。上面解釋了一下一個LSTM Cell中具體的結構跟功能。為了能夠更好的加深大家對於LSTM的理解，我還是用程式碼演示一下如何構建一個LSTM cell，程式碼如下所示：

def lstm_cell(xt, a_prev, c_prev, parameters):
    """
    Implement a single forward step of the LSTM-cell as described in Figure (4)

    Arguments:
    xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    c_prev -- Memory state at timestep "t-1", numpy array of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
                        bf -- Bias of the forget gate, numpy array of shape (n_a, 1)
                        Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)
                        bi -- Bias of the update gate, numpy array of shape (n_a, 1)
                        Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)
                        bc --  Bias of the first "tanh", numpy array of shape (n_a, 1)
                        Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x)
                        bo --  Bias of the output gate, numpy array of shape (n_a, 1)
                        Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
                        
    Returns:
    a_next -- next hidden state, of shape (n_a, m)
    c_next -- next memory state, of shape (n_a, m)
    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    cache -- tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, c_prev, xt, parameters)
    
    Note: ft/it/ot stand for the forget/update/output gates, cct stands for the candidate value (c tilde),
          c stands for the cell state (memory)
    """

    # Retrieve parameters from "parameters"
    Wf = parameters["Wf"] # forget gate weight
    bf = parameters["bf"]
    Wi = parameters["Wi"] # update gate weight (notice the variable name)
    bi = parameters["bi"] # (notice the variable name)
    Wc = parameters["Wc"] # candidate value weight
    bc = parameters["bc"]
    Wo = parameters["Wo"] # output gate weight
    bo = parameters["bo"]
    Wy = parameters["Wy"] # prediction weight
    by = parameters["by"]
    
    # Retrieve dimensions from shapes of xt and Wy
    n_x, m = xt.shape
    n_y, n_a = Wy.shape

    ### START CODE HERE ###
    # Concatenate a_prev and xt 
    concat = np.concatenate((a_prev, xt), axis=0)

    # Compute values for ft (forget gate), it (update gate),
    # cct (candidate value), c_next (cell state), 
    # ot (output gate), a_next (hidden state)
    ft = sigmoid(Wf.dot(concat)+bf)        # forget gate
    it = sigmoid(Wi.dot(concat)+bi)        # update gate
    cct = np.tanh(Wc.dot(concat)+bc)       # candidate value
    c_next = c_prev*ft + cct*it   # cell state
    ot = sigmoid(Wo.dot(concat)+bo)        # output gate
    a_next = ot*np.tanh(c_next)    # hidden state
    
    # Compute prediction of the LSTM cell
    yt_pred = softmax(Wy.dot(a_next)+by)    
    ### END CODE HERE ###

    # store values needed for backward propagation in cache
    cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)

    return a_next, c_next, yt_pred, cache

np.random.seed(1)
xt_tmp = np.random.randn(3,10)
a_prev_tmp = np.random.randn(5,10)
c_prev_tmp = np.random.randn(5,10)
parameters_tmp = {}
parameters_tmp['Wf'] = np.random.randn(5, 5+3)
parameters_tmp['bf'] = np.random.randn(5,1)
parameters_tmp['Wi'] = np.random.randn(5, 5+3)
parameters_tmp['bi'] = np.random.randn(5,1)
parameters_tmp['Wo'] = np.random.randn(5, 5+3)
parameters_tmp['bo'] = np.random.randn(5,1)
parameters_tmp['Wc'] = np.random.randn(5, 5+3)
parameters_tmp['bc'] = np.random.randn(5,1)
parameters_tmp['Wy'] = np.random.randn(2,5)
parameters_tmp['by'] = np.random.randn(2,1)

a_next_tmp, c_next_tmp, yt_tmp, cache_tmp = lstm_cell_forward(xt_tmp, a_prev_tmp, c_prev_tmp, parameters_tmp)
print("a_next[4] = \n", a_next_tmp[4])
print("a_next.shape = ", c_next_tmp.shape)
print("c_next[2] = \n", c_next_tmp[2])
print("c_next.shape = ", c_next_tmp.shape)
print("yt[1] =", yt_tmp[1])
print("yt.shape = ", yt_tmp.shape)
print("cache[1][3] =\n", cache_tmp[1][3])
print("len(cache) = ", len(cache_tmp))

總結

上面的兩個部分主要介紹了RNN和LSTM的結構，以及分析了它們結構內部的功能和流程。並且分別在每一個cell後面都用Python展示瞭如何用程式碼去構建一個RNN cell和LSTM cell。咱們可以理解LSTM是對RNN的一種優化，同時要明白為什麼要進行這樣的優化；其次最重要的是理解RNN的這樣一種新的解決問題的方法和思路，這跟咱們之前見過的standard neuro network最明顯的一個區別就是，之前在神經網路，regressor 或者classfier中，每一個輸出只跟咱們的輸入features相關，而RNN的思路則是不僅僅跟當前的輸入有關，還跟前面的輸入有關，這在以前sequence model中是非常常見的，例如Language modeling, machine translation等等的應用中，都是要用到RNN的思想的。

機器學習- RNN以及LSTM的原理分析

機器學習- RNN以及LSTM的原理分析

機器學習筆記之八—— knn-最簡單的機器學習演算法以及KD樹原理

【機器學習】LDA線性判別分析原理及例項

機器學習之PCA主成分分析

如何選擇機器學習模型進行數據分析

RNN與LSTM原理

機器學習現狀以及目前機器學習競賽的主流框架或演算法

【火爐煉AI】機器學習041-NLP句子情感分析

【機器學習】HOG detectMultiScale 引數分析

機器學習之LDA線性判別分析模型

【模式識別與機器學習】——PCA主成分分析

機器學習演算法的基本原理-附Python和R語言程式碼

【機器學習】Weighted LSSVM原理與Python實現：LSSVM的稀疏化改進

【機器學習】Apriori演算法——原理及程式碼實現（Python版）

[大資料專案]-0010-深入淺出Spark機器學習實戰（使用者行為分析）

機器學習入門：概念原理及常用演算法

機器學習之線性迴歸原理及sklearn實現

Netty學習之旅----ThreadLocal原理分析與效能優化思考(思考篇)

機器學習整合演算法XGBoost原理及推導

學機器學習，不會資料分析怎麼行？之NumPy詳解

機器學習- RNN以及LSTM的原理分析

相關推薦