torch記錄:torch.nn模組
Recurrent layers
class torch.nn.RNN(*args, **kwargs)
引數:
input_size – 輸入x的特徵數量。
hidden_size – 隱層的特徵數量。
num_layers – RNN的層數。
bidirectional – 如果True,將會變成一個雙向RNN,預設為False。
RNN的輸入: (input, h_0)
- input (seq_len, batch, input_size)
: 儲存輸入序列特徵的tensor。
h_0 (num_layers * num_directions, batch, hidden_size)
RNN的輸出: (output, h_n)
output (seq_len, batch, hidden_size * num_directions)
: 儲存著RNN最後一層的輸出特徵。
h_n (num_layers * num_directions, batch, hidden_size)
: 儲存著最後一個時刻隱狀態。
例子:
#輸入x的長度是10,隱層的長度是20,RNN的層數是2層
rnn = nn.RNN(10, 20, 2)
# (seq_len, batch, input_size)
input = torch.randn(5, 3, 10 )
# (num_layers * num_directions, batch, hidden_size)
h0 = torch.randn(2, 3, 20)
output, hn = rnn(input, h0)
print(output.shape) # (seq_len, batch, hidden_size * num_directions)
print(hn.shape) # (num_layers * num_directions, batch, hidden_size)
torch.Size([5, 3, 20])
torch.Size([2, 3, 20])
同理:
class torch.nn.GRU(*args, **kwargs)
class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')[source]
另一類:
class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')
Linear layers
class torch.nn.Linear(in_features, out_features, bias=True)
Applies a linear transformation to the incoming data: y=xA^T+b
例子:
# 三維特徵轉化為2維特徵
m = nn.Linear(3, 2)
input = torch.randn(10, 3)
output = m(input)
print(output.size())
torch.Size([10, 2])
Dropout layers
class torch.nn.Dropout(p=0.5, inplace=False)
引數:
p
- 將元素置0的概率。預設值:0.5
in-place
- 若設定為True,會在原地執行操作。預設值:False
形狀:
輸入: 任意。輸入可以為任意形狀。
輸出: 相同。輸出和輸入形狀相同。
例子:
m = nn.Dropout(p=0.5)
input = autograd.Variable(torch.randn(2, 2))
output = m(input)
output
tensor([[-0.0000, -2.9296],
[ 0.0924, 0.0000]])
Sparse layers
class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False, _weight=None)[s
引數:
num_embeddings (int)
- 嵌入字典的大小
embedding_dim (int
) - 每個嵌入向量的大小
padding_idx (int, optional)
- 如果提供的話,輸出遇到此下標時用零填充
max_norm (float, optional) - 如果提供的話,會重新歸一化詞嵌入,使它們的範數小於提供的值
norm_type (float, optional) - 對於max_norm選項計算p範數時的p
scale_grad_by_freq (boolean, optional) - 如果提供的話,會根據字典中單詞頻率縮放梯度
變數:
weight (Tensor) -形狀為(num_embeddings, embedding_dim)的模組中可學習的權值
形狀:
輸入: LongTensor (N, W)
, N = mini-batch, W = 每個mini-batch中提取的下標數
輸出: (N, W, embedding_dim)
例子:
from torch.autograd import Variable
# an Embedding module containing 10 tensors of size 3
embedding = nn.Embedding(10, 3)
# a batch of 2 samples of 4 indices each
input = Variable(torch.LongTensor([[1,2,4,5],[5,4,2,1]]))
embedding(input)
tensor([[[-0.4031, 1.8008, 1.4954],
[ 0.3768, -0.2439, 0.9262],
[ 0.8444, -0.1265, 2.0801],
[ 1.0576, -0.9705, -0.1841]],
[[ 1.0576, -0.9705, -0.1841],
[ 0.8444, -0.1265, 2.0801],
[ 0.3768, -0.2439, 0.9262],
[-0.4031, 1.8008, 1.4954]]])
embedding.weight
Parameter containing:
tensor([[-0.6084, 0.0402, -1.5447],
[-0.4031, 1.8008, 1.4954],
[ 0.3768, -0.2439, 0.9262],
[ 0.4351, -1.6146, 0.7603],
[ 0.8444, -0.1265, 2.0801],
[ 1.0576, -0.9705, -0.1841],
[ 0.6502, -0.1189, 0.0794],
[-0.9843, -0.1582, -0.0912],
[ 0.1690, -0.0980, -0.1338],
[-0.9448, -1.9642, -0.1723]])
example with padding_idx:
# example with padding_idx
embedding = nn.Embedding(10, 3, padding_idx= 1)
input = Variable(torch.LongTensor([[0,1,0,5]]))
embedding(input)
tensor([[[-1.1790, 1.2073, -1.0174],
[ 0.0000, 0.0000, 0.0000],
[-1.1790, 1.2073, -1.0174],
[-0.2278, 1.1332, -0.2259]]])