手把手構建LSTM的向前傳播(Building a LSTM step by step)

阿新 • • 發佈：2020-03-22

本篇是在之前兩篇基礎上接著寫的: 吳恩達deepLearning.ai迴圈神經網路RNN學習筆記(理論篇) 從頭構建迴圈神經網路RNN的向前傳播(rnn in pure python) 也可以不看，如果以下有看不懂的，再回過頭來看上面兩篇也行。前言目錄

閥門和狀態描述
LSTM cell
LSTM整個過程

需要理解:

遺忘門，更新門，輸出門的作用是什麼，它們是怎麼發揮作用的。
單元狀態 cell state 是如何來選擇性保留資訊。

下面這張圖將示意LSTM的操作。

LSTM單元，它在每一個時間步長跟蹤更新“單元狀態”或者是記憶變數。同之前講的RNN例子一樣，我們將以一個時間步長的LSTM單元執行開始，接著你就可以用for迴圈處理Tx個時間步長。閥門和狀態概述遺忘門概念:

假設我們正在閱讀一段文字中的單詞，並計劃使用LSTM跟蹤語法結構，例如判斷主體是單數（“ puppy”）還是複數（“ puppies”）。

如果主體更改其狀態（從單數詞更改為複數詞），那麼先前的記憶狀態將過時，因此我們“忘記”過時的狀態。
“遺忘門”是一個張量，它包含介於0和1之間的值。
如果遺忘門中的一個單元的值接近於0，則LSTM將“忘記”之前單元狀態相應單位的儲存值。
如果遺忘門中的一個單元的值接近於1，則LSTM將記住大部分相應的值。

公式:

公式的解釋:

包含控制遺忘門行為的權重。
之前時間步長的隱藏狀態和當前時間步長的輸入連線在一起乘以。
sigmoid函式讓每個門的張量值在0到1之間。
遺忘門和之前的單元狀態有相同的shape。
這就意味著它們可以按照元素相乘。

將張量和相乘相當於在之前的單元狀態應用一層蒙版。
如果中的單個值是0或者接近於0，那麼乘積就接近0.
這就是使得儲存在對應單位的值在下一個時間步長不會被記住。
同樣，如果中的1個值接近於1，那麼乘積就接近之前單元狀態的原始值。
LSTM就會在下一個時間步長中保留對應單位的值。

在程式碼中的變數名:

Wf: 遺忘門的權重
Wb: 遺忘門的偏差
ft: 遺忘門

候選值概念:

候選值是包含當前時間步長資訊的張量，它可能會儲存在當前單元狀態中。
傳遞候選值的哪些部分取決於更新門。
候選值是一個張量，它的範圍從-1到1。
代字號“〜”用於將候選值與單元狀態區分開。

公式:

公式的解釋:

'tanh'函式產生的值介於-1和+1之間。

在程式碼中的變數名:

cct: 候選值

更新門概念:

我們使用更新門來確定候選的哪些部分要新增到單元狀態中。
更新門是包含0到1之間值的張量。
當更新門中的單位接近於0時，它將阻止候選值中的相應值傳遞到。
當更新門中的單位接近1時，它允許將候選的值傳遞到。
注意，我們使用下標“i”而不是“u”來遵循文獻中使用的約定。

公式:

公式的解釋:

類似於遺忘門（此處為），用sigmoid函式乘後值就落在了0到1之間。
將更新門與候選元素逐元素相乘，並將此乘積()用於確定單元狀態。

在程式碼中的變數名: 在程式碼中，我們將使用學術文獻中的變數名。這些變數不使用“ u”表示“更新”。

wi是更新門的權重
bi是更新門的偏差
it是更新門

單元狀態概念:

單元狀態是傳遞到未來時間步長的“記憶/記憶體(memory)”。
新單元狀態是先前單元狀態和候選值的組合。

公式:

公式的解釋:

之前的單元狀態通過遺忘門調整(加權)。
候選值通過更新門調整(加權)。

在程式碼中的變數名:

c: 單元狀態，包含所有的時間步長，c的shape是(na, m, T)
c_next: 下一個時間步長的單元狀態，的shape (na, m)
c_prev: 之前的單元狀態，的shape (na, m)

輸出門

概念:

輸出門決定時間步長要輸出的預測值。
輸出門與其他門一樣，它包含從0到1的值。

公式: 公式的解釋:

輸出門由之前的隱藏狀態和當前的輸入決定。
sigmoid函式讓值的範圍在0到1之間。

在程式碼中的變數名:

wo: 輸出門的權重
bo: 輸出門的偏差
ot: 輸出門

隱藏狀態概念:

隱藏狀態將傳遞到LSTM單元的下一個時間步長。
它用於確定下一個時間步長的三個門()。
隱藏狀態也用於預測。

公式:

公式的解釋:

隱藏狀態由單元狀態結合輸出門確定。
單元狀態通過“ tanh”函式把值縮放到-1和+1之間。
輸出門的作用就像一個“掩碼mask”，它既可以保留的值，也可以使這些值不包含在隱藏狀態中。

在程式碼中的變數名:

a: 隱藏狀態，包含時間步長，shape (na, m, Tx)
a_prev: 前一步的隱藏狀態，的shape (na, m)
a_next: 下一步的隱藏狀態，的shape (na, m)

預測值概念:

此用例的預測是分類，所以我們用softmax。

公式:

在程式碼中的變數名:

y_pred: 預測，包含所有的時間步長，的shape (ny, m, Tx)，注意，本例中Tx=Ty。
yt_pred: 當前時間步長t的預測值，shape是(ny, m)

LSTM cell‍ 一共三個步驟: 1. 連線隱藏狀態和輸入成一個單獨的矩陣 2. 依次計算上面那6個公式 3. 計算預測值

def lstm_cell_forward(xt, a_prev, c_prev, parameters):
    """
    Implement a single forward step of the LSTM-cell as described in Figure (4)

    Arguments:
    xt -- your input data at timestep "t", numpy array of shape (n_x, m).
    a_prev -- Hidden state at timestep "t-1", numpy array of shape (n_a, m)
    c_prev -- Memory state at timestep "t-1", numpy array of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
                        bf -- Bias of the forget gate, numpy array of shape (n_a, 1)
                        Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)
                        bi -- Bias of the update gate, numpy array of shape (n_a, 1)
                        Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)
                        bc --  Bias of the first "tanh", numpy array of shape (n_a, 1)
                        Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x)
                        bo --  Bias of the output gate, numpy array of shape (n_a, 1)
                        Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

    Returns:
    a_next -- next hidden state, of shape (n_a, m)
    c_next -- next memory state, of shape (n_a, m)
    yt_pred -- prediction at timestep "t", numpy array of shape (n_y, m)
    cache -- tuple of values needed for the backward pass, contains (a_next, c_next, a_prev, c_prev, xt, parameters)

    Note: ft/it/ot stand for the forget/update/output gates, cct stands for the candidate value (c tilde),
          c stands for the cell state (memory)
    """
    # 從 "parameters" 中取出引數。
    Wf = parameters["Wf"] # 遺忘門權重
    bf = parameters["bf"]
    Wi = parameters["Wi"] # 更新門權重 (注意變數名下標是i不是u哦)
    bi = parameters["bi"] # (notice the variable name)
    Wc = parameters["Wc"] # 候選值權重
    bc = parameters["bc"]
    Wo = parameters["Wo"] # 輸出門權重
    bo = parameters["bo"]
    Wy = parameters["Wy"] # 預測值權重
    by = parameters["by"]
    # 連線 a_prev 和 xt
    concat = np.concatenate((a_prev, xt), axis=0)
    # 等價於下面程式碼
    # 從 xt 和 Wy 中取出維度
    # n_x, m = xt.shape
    # n_y, n_a = Wy.shape
    # concat = np.zeros((n_a + n_x, m))
    # concat[: n_a, :] = a_prev
    # concat[n_a :, :] = xt
    # 計算 ft (遺忘門), it (更新門)的值
    # cct (候選值), c_next (單元狀態), 
    # ot (輸出門), a_next (隱藏單元) 
    ft = sigmoid(np.dot(Wf, concat) + bf)        # 遺忘門
    it = sigmoid(np.dot(Wi, concat) + bi)        # 更新門
    cct = np.tanh(np.dot(Wc, concat) + bc)       # 候選值
    c_next = ft * c_prev + it * cct    # 單元狀態
    ot = sigmoid(np.dot(Wo, concat) + bo)        # 輸出門
    a_next = ot * np.tanh(c_next)    # 隱藏狀態
    # 計算LSTM的預測值
    yt_pred = softmax(np.dot(Wy, a_next) + by)
    # 用於反向傳播的快取
    cache = (a_next, c_next, a_prev, c_prev, ft, it, cct, ot, xt, parameters)

    return a_next, c_next, yt_pred, cache

LSTM向前傳播我們已經實現了一個時間步長的LSTM，現在我們可以用for迴圈對它進行迭代，處理一系列的Tx輸入。

LSTM的多個時間步長指導:

從變數x 和 parameters中獲得的維度。
初始化三維張量 , 和 .
: 隱藏狀態, shape
: 單元狀態, shape
: 預測, shape (注意在這個例子裡 ).
注意將一個變數設定來和另一個變數相等是"按引用複製". 換句話說，就是不用使用c = a, 否則這兩個變數指的是同一個變數，更改任何其中一個變數另一個變數的值都會跟著變。
初始化二維張量
儲存了t時間步長的隱藏狀態，它的變數名是a_next。
, 時間步長0時候的初始隱藏狀態，呼叫該函式時候傳入的值，它的變數名是a0。
和代表單個時間步長，所以他們的shape都是
通過傳入函式的初始化隱藏狀態來初始化。
用0來初始化。
變數名是 c_next.
表示單個時間步長, 所以它的shape是
注意: create c_next as its own variable with its own location in memory. 不要將它通過3維張量的切片來初始化，換句話說，不要 c_next = c[:,:,0].
對每個時間步長，做以下事情:
從3維的張量中, 獲取在時間步長t處的2維切片。
呼叫你之前定義的 lstm_cell_forward 函式，獲得隱藏狀態，單元狀態，預測值。
儲存隱藏狀態，單元狀態，預測值到3維張量中。
把快取加入到快取列表。

def lstm_forward(x, a0, parameters):
    """
    Implement the forward propagation of the recurrent neural network using an LSTM-cell described in Figure (4).

    Arguments:
    x -- Input data for every time-step, of shape (n_x, m, T_x).
    a0 -- Initial hidden state, of shape (n_a, m)
    parameters -- python dictionary containing:
                        Wf -- Weight matrix of the forget gate, numpy array of shape (n_a, n_a + n_x)
                        bf -- Bias of the forget gate, numpy array of shape (n_a, 1)
                        Wi -- Weight matrix of the update gate, numpy array of shape (n_a, n_a + n_x)
                        bi -- Bias of the update gate, numpy array of shape (n_a, 1)
                        Wc -- Weight matrix of the first "tanh", numpy array of shape (n_a, n_a + n_x)
                        bc -- Bias of the first "tanh", numpy array of shape (n_a, 1)
                        Wo -- Weight matrix of the output gate, numpy array of shape (n_a, n_a + n_x)
                        bo -- Bias of the output gate, numpy array of shape (n_a, 1)
                        Wy -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)

    Returns:
    a -- Hidden states for every time-step, numpy array of shape (n_a, m, T_x)
    y -- Predictions for every time-step, numpy array of shape (n_y, m, T_x)
    c -- The value of the cell state, numpy array of shape (n_a, m, T_x)
    caches -- tuple of values needed for the backward pass, contains (list of all the caches, x)
    """

    # 初始化 "caches", 用來儲存每個時間步長的cache值的 
    caches = []
 
    Wy = parameters['Wy'] 
    # 從 x 和 parameters['Wy'] 的shape中獲取緯度值    
    n_x, m, T_x = x.shape
    n_y, n_a = Wy.shape

    # 初始化 "a", "c" and "y" 
    a = np.zeros((n_a, m, T_x))
    c = np.zeros((n_a, m, T_x))
    y = np.zeros((n_y, m, T_x))

    # 初始化 a_next and c_next 
    a_next = a0
    c_next = np.zeros(a_next.shape)

    # loop over all time-steps
    for t in range(T_x):
        # 從3維張量x中獲取t時間步長的2維張量xt
        xt = x[:, :, t]
        # 更新下一個時間步長的隱藏狀態, 下一個單元狀態, 計算預測值
        a_next, c_next, yt, cache = lstm_cell_forward(xt, a_next, c_next, parameters)
        # 把下一個時間步長長的隱藏狀態儲存起來 
        a[:,:,t] = a_next
        # 把下一個時間步長長的單元狀態儲存起來
        c[:,:,t]  = c_next
        # 把預測值儲存起來
        y[:,:,t] = yt
        # 儲存快取值
        caches.append(cache)
    
    # 用於向後傳播
    caches = (caches, x)

    return a, y, c, caches

恭喜你！現在，你已經為LSTM實現了前向傳播。使用深度學習框架時，實施前向傳播足以構建出色效能的系統。 &nbs

手把手構建LSTM的向前傳播(Building a LSTM step by step)

手把手構建LSTM的向前傳播(Building a LSTM step by step)

Illustrated Guide to LSTM's and GRU's: A step by step explanation

Step by Step LSTM Walk Through

Building your Deep Neural Network: Step by Step¶

吳恩達深度學習1-4課後作業1 Building your Deep Neural Network: Step by Step

PyTorch in Action: A Step by Step Tutorial

【轉載】Implementing a Principal Component Analysis (PCA) in Python step by step

Tensorflow基礎知識與神經網路構建--step by step 入門TensorFlow（一）

[Transducer] Step by Step to build a simple transducer

DeepLearning-NLP-NN&RNN&LSTM正向傳播和反向傳播

使用ConstraintLayout構建響應式UI(Build a Responsive UI with ConstraintLayout)

4.3 Building a Datapath-Computer Organization and Design筆記

poj 2931 Building a Space Station <克魯斯卡爾>

BZOJ1696: [Usaco2007 Feb]Building A New Barn新牛舍

POJ2031 Building a Space Station【最小生成樹】

Building a Space Station POJ 2031 【最小生成樹 prim】

POJ 2031 Building a Space Station

Building a Keras + deep learning REST API（三部曲之一）

搭建一個簡單的神經網路（向前傳播）

eclipse如何優化構建的速度（Building）

手把手構建LSTM的向前傳播(Building a LSTM step by step)

相關推薦