Pytorch之LSTM模型理解及入門使用

阿新 • • 發佈：2020-07-24

1.Pytorch中的LSTM結構的邏輯圖

class torch.nn.LSTM(*args, **kwargs)

Pytorch官方文件中引數說明：

Args:
        input_size: The number of expected features in the input `x`
        hidden_size: The number of features in the hidden state `h`
        num_layers: Number of recurrent layers. E.g., setting ``num_layers=2 
``
            would mean stacking two LSTMs together to form a `stacked LSTM`,
            with the second LSTM taking in outputs of the first LSTM and
            computing the final results. Default: 1
        bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`.
            Default: ``True``
        batch_first 
: If ``True``, then the input and output tensors are provided
            as (batch, seq, feature). Default: ``False``
        dropout: If non-zero, introduces a `Dropout` layer on the outputs of each
            LSTM layer except the last layer, with dropout probability equal to
            :attr:`dropout`. Default:  
0
        bidirectional: If ``True``, becomes a bidirectional LSTM. Default: ``False``

    Inputs: input, (h_0, c_0)
        - **input** of shape `(seq_len, batch, input_size)`: tensor containing the features
          of the input sequence.
          The input can also be a packed variable length sequence.
          See :func:`torch.nn.utils.rnn.pack_padded_sequence` or
          :func:`torch.nn.utils.rnn.pack_sequence` for details.
        - **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
          containing the initial hidden state for each element in the batch.
          If the LSTM is bidirectional, num_directions should be 2, else it should be 1.
        - **c_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
          containing the initial cell state for each element in the batch.

          If `(h_0, c_0)` is not provided, both **h_0** and **c_0** default to zero.


    Outputs: output, (h_n, c_n)
        - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor
          containing the output features `(h_t)` from the last layer of the LSTM,
          for each `t`. If a :class:`torch.nn.utils.rnn.PackedSequence` has been
          given as the input, the output will also be a packed sequence.

          For the unpacked case, the directions can be separated
          using ``output.view(seq_len, batch, num_directions, hidden_size)``,
          with forward and backward being direction `0` and `1` respectively.
          Similarly, the directions can be separated in the packed case.
        - **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
          containing the hidden state for `t = seq_len`.

          Like *output*, the layers can be separated using
          ``h_n.view(num_layers, num_directions, batch, hidden_size)`` and similarly for *c_n*.
        - **c_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
          containing the cell state for `t = seq_len`.

引數列表：

input_size：x的特徵維度
hidden_size：隱藏層的特徵維度
num_layers：lstm隱層的層數，預設為1
bias：False則bih=0和bhh=0. 預設為True
batch_first：True則輸入輸出的資料格式為 (batch, seq, feature)
dropout：除最後一層，每一層的輸出都進行dropout，預設為: 0
bidirectional：True則為雙向lstm預設為False
輸入：input, (h0, c0)
輸出：output, (hn,cn)

輸入資料格式：
input (seq_len, batch, input_size)
h0 (num_layers * num_directions, batch, hidden_size)
c0 (num_layers * num_directions, batch, hidden_size)

輸出資料格式：
output (seq_len, batch, hidden_size * num_directions)
hn (num_layers * num_directions, batch, hidden_size)
cn (num_layers * num_directions, batch, hidden_size)

Pytorch裡的LSTM單元接受的輸入都必須是3維的張量(Tensors).每一維代表的意思不能弄錯。

第一維體現的是序列（sequence）結構,也就是序列的個數，用文章來說，就是每個句子的長度，因為是餵給網路模型，一般都設定為確定的長度，也就是我們餵給LSTM神經元的每個句子的長度，當然，如果是其他的帶有帶有序列形式的資料，則表示一個明確分割單位長度，

例如是如果是股票資料內，這表示特定時間單位內，有多少條資料。這個引數也就是明確這個層中有多少個確定的單元來處理輸入的資料。

第二維度體現的是batch_size，也就是一次性餵給網路多少條句子，或者股票資料中的，一次性餵給模型多少個時間單位的資料，具體到每個時刻，也就是一次性餵給特定時刻處理的單元的單詞數或者該時刻應該餵給的股票資料的條數

第三維度體現的是輸入的元素（elements of input），也就是，每個具體的單詞用多少維向量來表示，或者股票資料中每一個具體的時刻的採集多少具體的值，比如最低價，最高價，均價，5日均價，10均價，等等

H0-Hn是什麼意思呢？就是每個時刻中間神經元應該儲存的這一時刻的根據輸入和上一課的時候的中間狀態值應該產生的本時刻的狀態值，

這個資料單元是起的作用就是記錄這一時刻之前考慮到所有之前輸入的狀態值，形狀應該是和特定時刻的輸出一致

c0-cn就是開關，決定每個神經元的隱藏狀態值是否會影響的下一時刻的神經元的處理，形狀應該和h0-hn一致。

當然如果是雙向，和多隱藏層還應該考慮方向和隱藏層的層數。

參考文獻：https://zhuanlan.zhihu.com/p/41261640

Pytorch之LSTM模型理解及入門使用

Pytorch之LSTM模型理解及入門使用

Vue 新手學習筆記：vue-element-admin 之安裝，配置及入門開發

Linux命令之--systemctl詳細理解及常用命令

Pytorch之view及view_as使用詳解

Pytorch之儲存讀取模型例項

PyTorch之 torch.nn.Embedding 詞嵌入層的理解

PyTorch之nn.Module類與前向傳播函式forward的理解

Pytorch學習筆記16----CNN或LSTM模型儲存與載入

web前端高階React - React從入門到出門之元件化開發及Props屬性傳值

ROS學習之利用xacro/URDF模型搭建及rviz和gazebo模擬

Vue新手指南之環境搭建及入門

網路流量預測入門（三）之LSTM預測網路流量

Java 併發基礎之記憶體模型（非常詳細）

iOS逆向學習之十（arm64彙編入門）

Spark記憶體模型介紹及Spark應用記憶體優化踩坑記錄

Kubernetes網路分析之Flannel工作原理及原始碼實現

Redis之坑：理解Redis事務

Spring Boot Security Oauth2之客戶端模式及密碼模式實現

【Java雜貨鋪】JVM#Java高牆之記憶體模型

pytorch之新增BN的實現

Pytorch之LSTM模型理解及入門使用

相關推薦