關於torch.nn.LSTM()的輸入和輸出
主角torch.nn.LSTM()
初始化時要傳入的引數
| Args: | input_size: The number of expected features in the input `x` | hidden_size: The number of features in the hidden state `h` | num_layers: Number of recurrent layers. E.g., setting ``num_layers=2`` | would mean stacking two LSTMs together to form a `stacked LSTM`, | with the second LSTM taking in outputs of the first LSTM and | computing the final results. Default: 1 | bias: If ``False``, then the layer does not use bias weights `b_ih` and `b_hh`. | Default: ``True`` | batch_first: If ``True``, then the input and output tensors are provided | as `(batch, seq, feature)` instead of `(seq, batch, feature)`. | Note that this does not apply to hidden or cell states. See the | Inputs/Outputs sections below for details. Default: ``False`` | dropout: If non-zero, introduces a `Dropout` layer on the outputs of each | LSTM layer except the last layer, with dropout probability equal to | :attr:`dropout`. Default: 0 | bidirectional: If ``True``, becomes a bidirectional LSTM. Default: ``False`` | proj_size: If ``> 0``, will use LSTM with projections of corresponding size. Default: 0
input_size:一般是詞嵌入的大小
hidden_size:隱含層的維度
num_layers:預設是1,單層LSTM
bias:是否使用bias
batch_first:預設為False,如果設定為True,則表示第一個維度表示的是batch_size
dropout:直接看英文吧
bidirectional:預設為False,表示單向LSTM,當設定為True,表示為雙向LSTM,一般和num_layers配合使用(需要注意的是當該項設定為True時,將num_layers設定為1,表示由1個雙向LSTM構成)
模型輸入輸出-單向LSTM
import torch import torch.nn as nn import numpy as np inputs_numpy = np.random.random((64,32,300)) inputs = torch.from_numpy(inputs_numpy).to(torch.float32) inputs.shape
torch.Size([64, 32, 300]):表示[batchsize, max_length, embedding_size]
hidden_size = 128
lstm = nn.LSTM(300, 128, batch_first=True, num_layers=1)
output, (hn, cn) = lstm(inputs)
print(output.shape)
print(hn.shape)
print(cn.shape)
torch.Size([64, 32, 128])
torch.Size([1, 64, 128])
torch.Size([1, 64, 128])
說明:
output:儲存了每個時間步的輸出,如果想要獲取最後一個時間步的輸出,則可以這麼獲取:output_last = output[:,-1,:]
h_n:包含的是句子的最後一個單詞的隱藏狀態,與句子的長度seq_length無關
c_n:包含的是句子的最後一個單詞的細胞狀態,與句子的長度seq_length無關
另外:最後一個時間步的輸出等於最後一個隱含層的輸出
output_last = output[:,-1,:]
hn_last = hn[-1]
print(output_last.eq(hn_last))
模型輸入輸出-雙向LSTM
首先我們要明確:
output :(seq_len, batch, num_directions * hidden_size)
h_n:(num_layers * num_directions, batch, hidden_size)
c_n :(num_layers * num_directions, batch, hidden_size)
其中num_layers表示層數,這裡是1,num_directions表示方向數,由於是雙向的,這裡是2,也是,我們就有下面的結果:
import torch
import torch.nn as nn
import numpy as np
inputs_numpy = np.random.random((64,32,300))
inputs = torch.from_numpy(inputs_numpy).to(torch.float32)
inputs.shape
hidden_size = 128
lstm = nn.LSTM(300, 128, batch_first=True, num_layers=1, bidirectional=True)
output, (hn, cn) = lstm(inputs)
print(output.shape)
print(hn.shape)
print(cn.shape)
torch.Size([64, 32, 256])
torch.Size([2, 64, 128])
torch.Size([2, 64, 128])
這裡面的hn包含兩個元素,一個是正向的隱含層輸出,一個是方向的隱含層輸出。
#獲取反向的最後一個output
output_last_backward = output[:,0,-hidden_size:]
#獲反向最後一層的hn
hn_last_backward = hn[-1]
#反向最後的output等於最後一層的hn
print(output_last_backward.eq(hn_last_backward))
#獲取正向的最後一個output
output_last_forward = output[:,-1,:hidden_size]
#獲取正向最後一層的hn
hn_last_forward = hn[-2]
# 反向最後的output等於最後一層的hn
print(output_last_forward.eq(hn_last_forward))
https://www.cnblogs.com/LiuXinyu12378/p/12322993.html
https://blog.csdn.net/m0_45478865/article/details/104455978
https://blog.csdn.net/foneone/article/details/104002372