1. 程式人生 > 實用技巧 >Pytorch之LSTM模型理解及入門使用

Pytorch之LSTM模型理解及入門使用

1.Pytorch中的LSTM結構的邏輯圖

class torch.nn.LSTM(*args, **kwargs)

Pytorch官方文件中引數說明:

Args:
        input_size: The number of expected features in the input `x`
        hidden_size: The number of features in the hidden state `h`
        num_layers: Number of recurrent layers. E.g., setting ``num_layers=2
`` would mean stacking two LSTMs together to form a `stacked LSTM`, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1 bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`. Default: ``True`` batch_first
: If ``True``, then the input and output tensors are provided
as (batch, seq, feature). Default: ``False`` dropout: If non-zero, introduces a `Dropout` layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to :attr:`dropout`. Default:
0 bidirectional: If ``True``, becomes a bidirectional LSTM. Default: ``False`` Inputs: input, (h_0, c_0) - **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See :func:`torch.nn.utils.rnn.pack_padded_sequence` or :func:`torch.nn.utils.rnn.pack_sequence` for details. - **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. If the LSTM is bidirectional, num_directions should be 2, else it should be 1. - **c_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial cell state for each element in the batch. If `(h_0, c_0)` is not provided, both **h_0** and **c_0** default to zero. Outputs: output, (h_n, c_n) - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor containing the output features `(h_t)` from the last layer of the LSTM, for each `t`. If a :class:`torch.nn.utils.rnn.PackedSequence` has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using ``output.view(seq_len, batch, num_directions, hidden_size)``, with forward and backward being direction `0` and `1` respectively. Similarly, the directions can be separated in the packed case. - **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the hidden state for `t = seq_len`. Like *output*, the layers can be separated using ``h_n.view(num_layers, num_directions, batch, hidden_size)`` and similarly for *c_n*. - **c_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the cell state for `t = seq_len`.

引數列表:

  • input_size:x的特徵維度
  • hidden_size:隱藏層的特徵維度
  • num_layers:lstm隱層的層數,預設為1
  • bias:False則bih=0和bhh=0. 預設為True
  • batch_first:True則輸入輸出的資料格式為 (batch, seq, feature)
  • dropout:除最後一層,每一層的輸出都進行dropout,預設為: 0
  • bidirectional:True則為雙向lstm預設為False
  • 輸入:input, (h0, c0)
  • 輸出:output, (hn,cn)

輸入資料格式:
input (seq_len, batch, input_size)
h0 (num_layers * num_directions, batch, hidden_size)
c0 (num_layers * num_directions, batch, hidden_size)

輸出資料格式:
output (seq_len, batch, hidden_size * num_directions)
hn (num_layers * num_directions, batch, hidden_size)
cn (num_layers * num_directions, batch, hidden_size)

Pytorch裡的LSTM單元接受的輸入都必須是3維的張量(Tensors).每一維代表的意思不能弄錯。

第一維體現的是序列(sequence)結構,也就是序列的個數,用文章來說,就是每個句子的長度,因為是餵給網路模型,一般都設定為確定的長度,也就是我們餵給LSTM神經元的每個句子的長度,當然,如果是其他的帶有帶有序列形式的資料,則表示一個明確分割單位長度,

例如是如果是股票資料內,這表示特定時間單位內,有多少條資料。這個引數也就是明確這個層中有多少個確定的單元來處理輸入的資料

第二維度體現的是batch_size,也就是一次性餵給網路多少條句子,或者股票資料中的,一次性餵給模型多少個時間單位的資料,具體到每個時刻,也就是一次性餵給特定時刻處理的單元的單詞數或者該時刻應該餵給的股票資料的條數

第三維度體現的是輸入的元素(elements of input),也就是,每個具體的單詞用多少維向量來表示,或者股票資料中 每一個具體的時刻的採集多少具體的值,比如最低價,最高價,均價,5日均價,10均價,等等

H0-Hn是什麼意思呢?就是每個時刻中間神經元應該儲存的這一時刻的根據輸入和上一課的時候的中間狀態值應該產生的本時刻的狀態值,

這個資料單元是起的作用就是記錄這一時刻之前考慮到所有之前輸入的狀態值,形狀應該是和特定時刻的輸出一致

c0-cn就是開關,決定每個神經元的隱藏狀態值是否會影響的下一時刻的神經元的處理,形狀應該和h0-hn一致。

當然如果是雙向,和多隱藏層還應該考慮方向和隱藏層的層數。

參考文獻:https://zhuanlan.zhihu.com/p/41261640