RNN pytorch 原始碼之頭髮掉光版解析
本萌新本來想好好學習下PYTORCH 版LSTM使用,學著學著還是一知半解 就準備去看看LSTM原始碼實現,發現是繼承自RNN 類,結果就來弄清楚RNN 原始碼,真實學海無涯 頭髮有限。。。。
咱先從最簡單的RNN模型下手,先不管幾層layer疊加、方向問題,小萌新突然發現 從原始碼學習真的進步大,勝過看好多別人整理的部落格。
那我就從原始碼註釋處
Inputs: input, h_0 - **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See :func:`torch.nn.utils.rnn.pack_padded_sequence`or :func:`torch.nn.utils.rnn.pack_sequence` for details. - **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be2, else it should be 1. Outputs: output, h_n - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor containing the output features (`h_t`) from the last layer of the RNN, for each `t`. If a :class:`torch.nn.utils.rnn.PackedSequence` has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using ``output.view(seq_len, batch, num_directions, hidden_size)``, with forwardand backward being direction `0` and `1` respectively. Similarly, the directions can be separated in the packed case. - **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the hidden state for `t = seq_len`. Like *output*, the layers can be separated using ``h_n.view(num_layers, num_directions, batch, hidden_size)``. Shape: - Input1: :math:`(L, N, H_{in})` tensor containing input features where :math:`H_{in}=\text{input\_size}` and `L` represents a sequence length. - Input2: :math:`(S, N, H_{out})` tensor containing the initial hidden state for each element in the batch. :math:`H_{out}=\text{hidden\_size}` Defaults to zero if not provided. where :math:`S=\text{num\_layers} * \text{num\_directions}` If the RNN is bidirectional, num_directions should be 2, else it should be 1. - Output1: :math:`(L, N, H_{all})` where :math:`H_{all}=\text{num\_directions} * \text{hidden\_size}` - Output2: :math:`(S, N, H_{out})` tensor containing the next hidden state for each element in the batch Attributes: weight_ih_l[k]: the learnable input-hidden weights of the k-th layer, of shape `(hidden_size, input_size)` for `k = 0`. Otherwise, the shape is `(hidden_size, num_directions * hidden_size)` weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer, of shape `(hidden_size, hidden_size)` bias_ih_l[k]: the learnable input-hidden bias of the k-th layer, of shape `(hidden_size)` bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer, of shape `(hidden_size)` .. note:: All the weights and biases are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where :math:`k = \frac{1}{\text{hidden\_size}}` .. include:: ../cudnn_rnn_determinism.rst .. include:: ../cudnn_persistent_rnn.rst Examples:: >>> rnn = nn.RNN(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> output, hn = rnn(input, h0)
輸入
輸入句子shape=`(seq_len, batch, input_size),我們一下子往模型輸入5個句子,每個句子長度不一致,以最長長度10作為MAX_LENGTH,每個token 用300個數字來表示,那shape=(5,10,300)
h_0 shape=(num_layers * num_directions, batch, hidden_size),如果沒有提供的話,就設為0
輸出
第一個返回值(output)是最後一層 所有時刻的隱藏狀態,因為每個time step 都有輸出,所以第一維大小等於 seq_length,第三維的話,如果是雙向的話,是需要在每個time step上將前向的隱狀態和後向的隱狀態進行拼接,因而大小是 方向數*hidden_size,
維度是 (seq_len, batch, num_directions * hidden_size)
第二個返回值(hn)是每一層最後一個時刻的隱藏狀態,如果是最簡單的一層lstm,單向的話,hn 就等於output[-1],維度是(
num_layers * num_directions, batch, hidden_size),我們解釋下第一維度,該維度表示每一層最後一個time step的輸出,假設我們現在是雙向的,共有兩層,h_n[0]表示第一層前向傳播最後一個time
step的輸出,h_n[1]表示第一層後向傳播最後一個time step的輸出,h_n[2]表示第二層前向傳播最後一個time step的輸出,h_n[3]表示第二層後向傳播最後一個time step的輸出
import torch import torch.nn as nn rnn = nn.RNN(input_size=150, hidden_size=300, num_layers=2, bidirectional=True,batch_first=False) input = torch.randn(10, 5, 150) h0 = torch.randn(4, 5, 300) c0 = torch.randn(4, 5, 300) output,hn = rnn(input, h0) print('output shape: ', output.shape) print('hn shape: ', hn.shape) 執行結果如下: output shape: torch.Size([10, 5, 600]) hn shape: torch.Size([4, 5, 300])
接下來加深下我們對引數output,hn的理解
1.前向傳播時,output中最後一個time step的前300個 應該與hn最後一層前向傳播的輸出應該一致。
output[-1, 0, :20] == hn[2, 0,:]
output中第一個句子 最後一個time step的前300個 應該寫成output[-1, 0, :300](或者
output[9, 0, :300])
hn第一個句子最後一層前向傳播的輸出hn[2, 0,:],0表示 第一層前向傳播最後一個time step,1表示第一層後向傳播最後一個time step,2表示第二次前向傳播最後一個time step
print(output[-1,0,:300]) print(output[-1,0,:300]==hn[2,0,:])
跑出來結果是對的哦,只怪我不該取那麼高的hidden size,打印出來太多了
需要訓練的引數
在forward()函式之前,有很多準備函式,檢查input,hidden,我們一個一個來瞅瞅
這個函式檢查了要前向傳輸的自變數,當batch_size不為空時,這意味著我們的sequence已經是pack之後的了,當這個sequence是pad之後的,我們輸入的期待維度便是2維了
1 def forward(self, input: Tensor, hx: Optional[Tensor] = None) -> Tuple[Tensor, Tensor]: 2 is_packed = isinstance(input, PackedSequence) 3 if is_packed: 4 input, batch_sizes, sorted_indices, unsorted_indices = input 5 max_batch_size = batch_sizes[0] 6 max_batch_size = int(max_batch_size) 7 else: 8 batch_sizes = None 9 max_batch_size = input.size(0) if self.batch_first else input.size(1) 10 sorted_indices = None 11 unsorted_indices = None 12 if hx is None: 13 num_directions = 2 if self.bidirectional else 1 14 hx = torch.zeros(self.num_layers * num_directions, 15 max_batch_size, self.hidden_size, 16 dtype=input.dtype, device=input.device) 17 else: 18 # Each batch of the hidden state should match the input sequence that 19 # the user believes he/she is passing in. 20 hx = self.permute_hidden(hx, sorted_indices) 21 self.check_forward_args(input, hx, batch_sizes) 22 _impl = _rnn_impls[self.mode] 23 if batch_sizes is None: 24 result = _impl(input, hx, self._flat_weights, self.bias, self.num_layers, 25 self.dropout, self.training, self.bidirectional, self.batch_first) 26 else: 27 result = _impl(input, batch_sizes, hx, self._flat_weights, self.bias, 28 self.num_layers, self.dropout, self.training, self.bidirectional) 29 output = result[0] 30 hidden = result[1] 31 if is_packed: 32 output = PackedSequence(output, batch_sizes, sorted_indices, unsorted_indices) 33 return output, self.permute_hidden(hidden, unsorted_indices)
上來先判斷輸入是否已經packed 了,再採取不同後續處理,
那我們先來了解下PackedSequence,PackedSequence目的是將將一批長度不同的句子 封裝成一個batch,可以直接作為RNN/LSTM的輸入.Pytorch提供了pack_padded_sequence()方法。
如果是已經pack過的,則batch_sizes 是從大到小降序的,因此max_batch_size 等於batch_sizes裡的第一個元素。
相反沒經過pack處理的話,如果輸入第一維是batch,那max_batch_size 等於輸入第一維度,如果輸入第二維是batch,max_batch_size=input.size(1)
接下來判斷我們的隱向量輸入hx了,同時預設hx為空,此處是如果我們沒有預設的隱向量輸入時,hx會初始化為0
_impl = _rnn_impls[self.mode]
_rnn_impls = { 'RNN_TANH': _VF.rnn_tanh, 'RNN_RELU': _VF.rnn_relu, }
小萌新讀到這 頭有點大了,去網上找找資料才發現,查詢_rnn_impls來知道我們要啟用的前向函式。pytorch上找到了相關原始碼,貼c++程式碼https://github.com/pytorch/pytorch/blob/1a93b96815b5c87c92e060a6dca51be93d712d09/aten/src/ATen/native/RNN.cpp