RNN pytorch 原始碼之頭髮掉光版解析

阿新 • • 發佈：2020-12-09

本萌新本來想好好學習下PYTORCH 版LSTM使用，學著學著還是一知半解就準備去看看LSTM原始碼實現，發現是繼承自RNN 類，結果就來弄清楚RNN 原始碼，真實學海無涯頭髮有限。。。。

咱先從最簡單的RNN模型下手，先不管幾層layer疊加、方向問題，小萌新突然發現從原始碼學習真的進步大，勝過看好多別人整理的部落格。

那我就從原始碼註釋處

Inputs: input, h_0
        - **input** of shape `(seq_len, batch, input_size)`: tensor containing the features
          of the input sequence. The input can also be a packed variable length
          sequence. See :func:`torch.nn.utils.rnn.pack_padded_sequence`
           
or :func:`torch.nn.utils.rnn.pack_sequence`
          for details.
        - **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
          containing the initial hidden state for each element in the batch.
          Defaults to zero if not provided. If the RNN is bidirectional,
          num_directions should be  
2, else it should be 1.

    Outputs: output, h_n
        - **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor
          containing the output features (`h_t`) from the last layer of the RNN,
          for each `t`.  If a :class:`torch.nn.utils.rnn.PackedSequence` has
          been given as the input, the output will also be a packed sequence.

          For the unpacked case, the directions can be separated
          using ``output.view(seq_len, batch, num_directions, hidden_size)``,
          with forward  
and backward being direction `0` and `1` respectively.
          Similarly, the directions can be separated in the packed case.
        - **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor
          containing the hidden state for `t = seq_len`.

          Like *output*, the layers can be separated using
          ``h_n.view(num_layers, num_directions, batch, hidden_size)``.

    Shape:
        - Input1: :math:`(L, N, H_{in})` tensor containing input features where
          :math:`H_{in}=\text{input\_size}` and `L` represents a sequence length.
        - Input2: :math:`(S, N, H_{out})` tensor
          containing the initial hidden state for each element in the batch.
          :math:`H_{out}=\text{hidden\_size}`
          Defaults to zero if not provided. where :math:`S=\text{num\_layers} * \text{num\_directions}`
          If the RNN is bidirectional, num_directions should be 2, else it should be 1.
        - Output1: :math:`(L, N, H_{all})` where :math:`H_{all}=\text{num\_directions} * \text{hidden\_size}`
        - Output2: :math:`(S, N, H_{out})` tensor containing the next hidden state
          for each element in the batch

    Attributes:
        weight_ih_l[k]: the learnable input-hidden weights of the k-th layer,
            of shape `(hidden_size, input_size)` for `k = 0`. Otherwise, the shape is
            `(hidden_size, num_directions * hidden_size)`
        weight_hh_l[k]: the learnable hidden-hidden weights of the k-th layer,
            of shape `(hidden_size, hidden_size)`
        bias_ih_l[k]: the learnable input-hidden bias of the k-th layer,
            of shape `(hidden_size)`
        bias_hh_l[k]: the learnable hidden-hidden bias of the k-th layer,
            of shape `(hidden_size)`

    .. note::
        All the weights and biases are initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`
        where :math:`k = \frac{1}{\text{hidden\_size}}`

    .. include:: ../cudnn_rnn_determinism.rst

    .. include:: ../cudnn_persistent_rnn.rst

    Examples::

        >>> rnn = nn.RNN(10, 20, 2)
        >>> input = torch.randn(5, 3, 10)
        >>> h0 = torch.randn(2, 3, 20)
        >>> output, hn = rnn(input, h0)

輸入

　　輸入句子shape=`(seq_len, batch, input_size),我們一下子往模型輸入5個句子，每個句子長度不一致，以最長長度10作為MAX_LENGTH，每個token 用300個數字來表示，那shape=（5，10，300）

　　h_0 shape=(num_layers * num_directions, batch, hidden_size),如果沒有提供的話，就設為0

輸出

第一個返回值（output）是最後一層 所有時刻的隱藏狀態，因為每個time step 都有輸出，所以第一維大小等於 seq_length,第三維的話，如果是雙向的話，是需要在每個time step上將前向的隱狀態和後向的隱狀態進行拼接，因而大小是 方向數*hidden_size,維度是 (seq_len, batch, num_directions * hidden_size)

第二個返回值(h_n)是每一層最後一個時刻的隱藏狀態，如果是最簡單的一層lstm，單向的話，hn 就等於output[-1]，維度是(num_layers * num_directions, batch, hidden_size)，我們解釋下第一維度，該維度表示每一層最後一個time step的輸出，假設我們現在是雙向的，共有兩層，h_n[0]表示第一層前向傳播最後一個time
step的輸出，h_n[1]表示第一層後向傳播最後一個time step的輸出，h_n[2]表示第二層前向傳播最後一個time step的輸出，h_n[3]表示第二層後向傳播最後一個time step的輸出

import torch
import torch.nn as nn
rnn = nn.RNN(input_size=150, hidden_size=300, num_layers=2, bidirectional=True,batch_first=False)
input = torch.randn(10, 5, 150)
h0 = torch.randn(4, 5, 300)
c0 = torch.randn(4, 5, 300)
output,hn = rnn(input, h0)
print('output shape: ', output.shape)
print('hn shape: ', hn.shape)

執行結果如下：
output shape:  torch.Size([10, 5, 600])
hn shape:  torch.Size([4, 5, 300])

接下來加深下我們對引數output，hn的理解

1.前向傳播時，output中最後一個time step的前300個應該與hn最後一層前向傳播的輸出應該一致。

output[-1, 0, :20] == hn[2, 0，：]
output中第一個句子 最後一個time step的前300個 應該寫成 output[-1, 0, :300]（或者 output[9, 0, :300]）

hn第一個句子最後一層前向傳播的輸出hn[2, 0，：]，0表示 第一層前向傳播最後一個time step,1表示第一層後向傳播最後一個time step，2表示第二次前向傳播最後一個time step

print(output[-1,0,:300])
print(output[-1,0,:300]==hn[2,0,:])

跑出來結果是對的哦，只怪我不該取那麼高的hidden size,打印出來太多了

需要訓練的引數

在forward()函式之前，有很多準備函式，檢查input,hidden,我們一個一個來瞅瞅

這個函式檢查了要前向傳輸的自變數，當batch_size不為空時，這意味著我們的sequence已經是pack之後的了，當這個sequence是pad之後的，我們輸入的期待維度便是2維了

 1     def forward(self, input: Tensor, hx: Optional[Tensor] = None) -> Tuple[Tensor, Tensor]:
 2         is_packed = isinstance(input, PackedSequence)
 3         if is_packed:
 4             input, batch_sizes, sorted_indices, unsorted_indices = input
 5             max_batch_size = batch_sizes[0]
 6             max_batch_size = int(max_batch_size)
 7         else:
 8             batch_sizes = None
 9             max_batch_size = input.size(0) if self.batch_first else input.size(1)
10             sorted_indices = None
11             unsorted_indices = None
12         if hx is None:
13             num_directions = 2 if self.bidirectional else 1
14             hx = torch.zeros(self.num_layers * num_directions,
15                              max_batch_size, self.hidden_size,
16                              dtype=input.dtype, device=input.device)
17         else:
18             # Each batch of the hidden state should match the input sequence that
19             # the user believes he/she is passing in.
20             hx = self.permute_hidden(hx, sorted_indices)
21         self.check_forward_args(input, hx, batch_sizes)
22         _impl = _rnn_impls[self.mode]
23         if batch_sizes is None:
24             result = _impl(input, hx, self._flat_weights, self.bias, self.num_layers,
25                            self.dropout, self.training, self.bidirectional, self.batch_first)
26         else:
27             result = _impl(input, batch_sizes, hx, self._flat_weights, self.bias,
28                            self.num_layers, self.dropout, self.training, self.bidirectional)
29         output = result[0]
30         hidden = result[1]
31         if is_packed:
32             output = PackedSequence(output, batch_sizes, sorted_indices, unsorted_indices)
33         return output, self.permute_hidden(hidden, unsorted_indices)

上來先判斷輸入是否已經packed 了，再採取不同後續處理，

那我們先來了解下PackedSequence,PackedSequence目的是將將一批長度不同的句子封裝成一個batch，可以直接作為RNN/LSTM的輸入.Pytorch提供了pack_padded_sequence()方法。

如果是已經pack過的，則batch_sizes 是從大到小降序的，因此max_batch_size 等於batch_sizes裡的第一個元素。

相反沒經過pack處理的話，如果輸入第一維是batch，那max_batch_size 等於輸入第一維度，如果輸入第二維是batch,max_batch_size=input.size(1)

接下來判斷我們的隱向量輸入hx了，同時預設hx為空，此處是如果我們沒有預設的隱向量輸入時，hx會初始化為0

_impl = _rnn_impls[self.mode]

_rnn_impls = {
    'RNN_TANH': _VF.rnn_tanh,
    'RNN_RELU': _VF.rnn_relu,
}

小萌新讀到這頭有點大了，去網上找找資料才發現，查詢_rnn_impls來知道我們要啟用的前向函式。pytorch上找到了相關原始碼，貼c++程式碼https://github.com/pytorch/pytorch/blob/1a93b96815b5c87c92e060a6dca51be93d712d09/aten/src/ATen/native/RNN.cpp

RNN pytorch 原始碼之頭髮掉光版解析

Vue原始碼分析之實現一個簡易版的Vue

目標參考 https://cn.vuejs.org/v2/guide/reactivity.html 使用 Typescript 編寫簡易版的 vue 實現資料的響應式和基本的檢視渲染，以及雙向繫結功能。

Scrapy 爬取重大注意事項！！因為這個困擾了我4天，頭髮都掉光了。。

原因爬取某站：則麼試都沒問題，程式碼提取沒問題。 IP = response.xpath(\'//*[@class=\"mimvp-tbl free-proxylist-tbl\"]/tbody/tr/td[2]//text()\').extract()

PyTorch 原始碼解讀之 BN & SyncBN

PyTorch 原始碼解讀之 BN & SyncBN 機器學習演算法工程師今天點藍色字關注“機器學習演算法工程師”

PyTorch 原始碼解讀之 torch.autograd

OpenMMLab機器學習演算法工程師今天 AI編輯：我是小將本文作者：OpenMMLab https://zhuanlan.zhihu.com/p/321449610

《地獄之刃》PC版更新支援光追DLSS和FSR

近日Ninja Theory釋出了《地獄之刃：塞娜的獻祭》PC版的重要更新，添加了對DirectX光追效果，英偉達DLSS和AMD

[原始碼解析] PyTorch 分散式之彈性訓練(3)---代理

在前面的文章之中，我們已經學習了PyTorch 分散式的基本模組，介紹了官方的幾個例子，我們接下來會介紹PyTorch的彈性訓練，本文是第三篇，看看彈性代理的基本功能。

探索SpringBoot-一起看看Spring核心原始碼之refresh（九）

前文回顧之前講了探索SpringBoot-一起看看Spring核心原始碼之ApplicationContext（八）,最後提到了ApplicationContext的refresh的三個核心過程分別是Bean發現、讀取、註冊。今天來進一步分析下原始碼。

Spring原始碼之Aop

本文主要介紹Spring的aop:aspectj-autoproxy/標籤，瞭解spring是如何實現掃描註解進行aop的，主要實現是在 AspectJAutoProxyBeanDefinitionParser的parser方法中，另外這裡

探索SpringBoot-一起看看Spring核心原始碼之BeanFactory（七）

前文回顧上篇講解了探索SpringBoot-一起來看看Spring容器載入核心原始碼（六）,講解到要探索obtainFreshBeanFactory()函式，但是不瞭解Spring容器的設計理念是沒有辦法來理解obtainFreshBeanFactory()函式的，所以今

探索SpringBoot-一起看看Spring原始碼之Resource（十）

前文回顧上期講了Spring IoC的refresh函式,詳情請見探索SpringBoot-一起看看Spring核心原始碼之refresh（九），今天暫時緩一緩，畢竟後面的知識還得好好理一理。之前用力過猛，我們還得打牢固一些基礎知識。比如在看

進擊React原始碼之磨刀試煉2

進擊React原始碼之磨刀試煉部分為原始碼解讀基礎部分，會包含多篇文章，本篇為第二篇，第一篇《進擊React原始碼之磨刀試煉1》入口（點選進入）。

探索SpringBoot-Spring原始碼之物件是如何註冊到IoC容器中的？（十一）

前文回顧之前探索SpringBoot系列也是到了探索SpringBoot-一起看看Spring原始碼之Resource（十）。之前有提到過Spring容器最重要的階段分為三個，分別是Bean的發現，讀取，註冊。今天我們來看看Bean的註冊。

PyTorch筆記之scatter()函式的使用

scatter() 和 scatter_() 的作用是一樣的，只不過 scatter() 不會直接修改原來的 Tensor，而 scatter_() 會

win10企業版系統之家下載_系統之家win10企業版官方下載地址

哪裡有靠譜的win10企業版系統之家下載？win10系統一直受到使用者的喜愛，改系統絕大多數驅動都能自動安裝，為使用者剩下大量的裝機時間。但有的使用者不知道哪裡可以下載，沒關係，下面小編就來為大家分享系統之家wi

win10正式版下載系統之家_系統之家win10正式版iso映象下載

有一些使用者對系統之家win10正式版系統感興趣，打算給自己的電腦裝win10系統，但是不知道哪裡下載，畢竟現在市面上的下載地址太多，那麼對於這種情況，今天小編就來為大家分享幾款好用的系統之家win10正式版iso映象

win10旗艦版系統之家官網下載_系統之家win10旗艦版下載地址

系統之家win10旗艦版哪裡可以下載？很多使用者在選擇系統時都很糾結，畢竟現在市面上的下載地址太多，難以分辨好壞。沒關係，下面小編就為大家分享了幾款好用的win10旗艦版系統之家官網下載地址。

win10專業版下載系統之家_系統之家win10專業版官方下載地址

哪裡有好用的系統之家win10專業版下載？該系統是一款功能齊全、操作便捷實用的系統。但是很多不熟悉系統的使用者不知道哪裡可以下載，所以今天小編為大家分享的就是系統之家win10專業版官方下載地址。

Pytorch學習之torch用法----比較操作(Comparison Ops)

1. torch.eq(input,other,out=None) 說明：比較元素是否相等，第二個引數可以是一個數，或者是第一個引數同類型形狀的張量

React原始碼之元件的實現與首次渲染

react: v15.0.0 本文講元件如何編譯以及 ReactDOM.render 的渲染過程。 babel 的編譯 babel 將 React JSX 編譯成 JavaScript.

RNN pytorch 原始碼之頭髮掉光版解析

輸入

輸出

需要訓練的引數

相關推薦