PyTorch(五)——PyTorch原始碼修改之增加ConvLSTM層

阿新 • • 發佈：2018-12-29

PyTorch的學習和使用（五）

通過擴充套件torch.nn的方式增加ConvLSTM在github-rogertrullo中有實現，但是由於LSTM是由多個cell組成，當處理連續資料和多層網路時，需要把cell串起來，程式碼中使用list.append()和for迴圈的方式實現，不同於tensorflow中提供了tf.nn.dynamic_rnn()函式可以動態載入自定的cell，因此需要手動實現。

在PyTorch中有處理LSTM的機制，因此可以直接使用該機制，修改原始碼的方式實現ConvLSTM，而且有助於理解LSTM和convolution的具體實現.

通過以下幾步實現：

PyTorch自帶LSTM實現分析

ConvLSTM介面增權重初始化和forward實現
ConvLSTM測試結果

PyTorch自帶LSTM實現分析

PyTorch中的所有層的實現都是首先在nn.modules.*中進行定義、引數說明和引數初始化，然後通過其定義的backend呼叫nn._functions.*中的具體實現，在 PyTorch(二)——搭建和自定義網路中也是通過這種順序增加自定義的損失函式。（ps：這應該是設計模式中的一種，但是不太瞭解，以後補上，有毒，在卷積裡又沒有這樣使用，直接通過F.conv2d()呼叫的）

首先定義一個LSTM，通過斷點的方式理解其函式傳遞方式與順序

首先給出LSTM執行的順序圖和時序圖（大概意思對，不是專業的，畫不好^_

!）

執行順序圖：
這裡寫圖片描述

時序圖：
這裡寫圖片描述

1. 定義一個LSTM並輸入值測試，使用官方文件的例子，具體引數含義可以檢視官方文件。

rnn = nn.LSTM(10, 20, 2) # 定義一個LSTM（初始化）
input = Variable(torch.rand(5, 3, 10))
h0= Variable(torch.rand(2, 3, 20))
c0= Variable(torch.rand(2, 3, 20))
output, hn = rnn(input, (h0, c0)) # 使用LSTM測試

2. LSTM定義時呼叫nn.modules.rnn.py中的LSTM類。

class LSTM(RNNBase):
	 def __init__(self, *args, **kwargs):
        super(LSTM, self).__init__('LSTM', *args, **kwargs)

3. 該類通過呼叫父類構造器進行初始化，具體程式碼就不貼了，主要進行引數的初始化工作。

class RNNBase(Module):

    def __init__(self, mode, input_size, hidden_size,
                 num_layers=1, bias=True, batch_first=False,
                 dropout=0, bidirectional=False):
                 # see details for http://pytorch.org/docs/master/_modules/torch/nn/modules/rnn.html#LSTM

4. 當使用LSTM進行前向傳播時呼叫基類（RNNBase）的forward()方法，該方法中主要通過_backend呼叫nn._functions.rnn.py中的RNN類。

 def forward(self, input, hx=None):
		# see details for http://pytorch.org/docs/master/_modules/torch/nn/modules/rnn.html#LSTM
        func = self._backend.RNN(
            self.mode,
            self.input_size,
            self.hidden_size,
            num_layers=self.num_layers,
            batch_first=self.batch_first,
            dropout=self.dropout,
            train=self.training,
            bidirectional=self.bidirectional,
            batch_sizes=batch_sizes,
            dropout_state=self.dropout_state,
            flat_weight=flat_weight
        )
        return output, hidden

5. nn._functions.rnn.py中的RNN類選擇GPU的呼叫。

def RNN(*args, **kwargs):
    def forward(input, *fargs, **fkwargs):
        if cudnn.is_acceptable(input.data):
            func = CudnnRNN(*args, **kwargs)
        else:
            func = AutogradRNN(*args, **kwargs)
        return func(input, *fargs, **fkwargs)

    return forward

6. 我們沒有使用GPU測試，因此呼叫AutogradRNN函式，該函式通過StackedRNN實現多個cell的連線, 並且根據是否有batch_sizes輸入選擇不同的LSTM處理方式。

def AutogradRNN(mode, input_size, hidden_size, num_layers=1, batch_first=False,
                dropout=0, train=True, bidirectional=False, batch_sizes=None,
                dropout_state=None, flat_weight=None):
	# see detials for https://github.com/pytorch/pytorch/blob/master/torch/nn/_functions/rnn.py
	if batch_sizes is None:
        rec_factory = Recurrent
    else:
        rec_factory = variable_recurrent_factory(batch_sizes)
        
    func = StackedRNN(layer,
                      num_layers,
                      (mode == 'LSTM'),
                      dropout=dropout,
                      train=train)

7. StackedRNN則對每一層呼叫Recurrent或者variable_recurrent_factory對每層進行處理。

def StackedRNN(inners, num_layers, lstm=False, dropout=0, train=True):
	# see details for https://github.com/pytorch/pytorch/blob/master/torch/nn/_functions/rnn.py

	for i in range(num_layers):		
        all_output = []
        for j, inner in enumerate(inners):
            l = i * num_directions + j

            hy, output = inner(input, hidden[l], weight[l])
            next_hidden.append(hy)
            all_output.append(output)

8. Recurrent對輸入的時序資料進行處理，呼叫LSTMCell具體實現。

def Recurrent(inner, reverse=False):
	# see details for https://github.com/pytorch/pytorch/blob/master/torch/nn/_functions/rnn.py
	
	for i in steps:	
        hidden = inner(input[i], hidden, *weight)
        # hack to handle LSTM
        output.append(hidden[0] if isinstance(hidden, tuple) else hidden)

9. LSTMCell實現LSTM操作。

def LSTMCell(input, hidden, w_ih, w_hh, b_ih=None, b_hh=None):
    if input.is_cuda:
        igates = F.linear(input, w_ih)
        hgates = F.linear(hidden[0], w_hh)
        state = fusedBackend.LSTMFused()
        return state(igates, hgates, hidden[1]) if b_ih is None else state(igates, hgates, hidden[1], b_ih, b_hh)

    hx, cx = hidden
    gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh) # 合併計算

    ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1) #拆分各個門

    ingate = F.sigmoid(ingate)
    forgetgate = F.sigmoid(forgetgate)
    cellgate = F.tanh(cellgate)
    outgate = F.sigmoid(outgate)

    cy = (forgetgate * cx) + (ingate * cellgate)
    hy = outgate * F.tanh(cy)

    return hy, cy

首先，LSTM的公式如下：

$g^{(t)} = \phi(W^{gx} x^{(t)} + W^{gh} h^{(t-1)} + b_g) \\ i^{(t)} = \sigma(W^{ix} x^{(t)} + W^{ih} h^{(t-1)} + b_i) \\ f^{(t)} = \sigma(W^{fx} x^{(t)} + W^{fh} h^{(t-1)} + b_f) \\ o^{(t)} = \sigma(W^{ox} x^{(t)} + W^{oh} h^{(t-1)} + b_o) \\ s^{(t)} = g^{(t)} \odot i^{(t)} + s^{(t-1)} \odot f^{(t)} \\ h^{(t)} = \phi(s^{(t)}) \odot o^{(t)}$
公式來自A Critical Review of Recurrent Neural Networks for Sequence Learning，其中 $\phi$ 為tanh啟用函式， $\sigma$ 為sigmoid啟用函式。

由於 $\sigma$ 為各個門之間的啟用函式，用於判斷多少資訊量可以通過，取值為0~1，因此選用sigmoid啟用函式，而 $\phi$ 為狀態和輸出的啟用函式，可以選擇其他的，比如ReLU等。

從公式中也可發現，其中有4個操作是重複的，都為Wx + Wh +b，因此在計算是可以合併計算，然後在分開得到各個門的值，如上述程式碼所示。

###ConvLSTM介面增加

由於convolution LSTM把原始的LSTM門之間的操作改為了卷積操作，因此在傳入引數時需要額外增加捲積核的大小，由於時序資料每時刻輸入資料尺度相同，因此卷積後的大小與輸入大小相同，則padding=(kernel - 1)/2.

主要做的工作有三個：

在nn/_functions/rnn.py中增加ConvLSTm的具體實現
通過輸入和資料，實現ConvLSTM的前向傳播
在nn/nodules/rnn.py修改RNNBase（Module）的傳入引數和卷積權重初始化
由於卷積和線性傳播的引數尺寸和個數不同，因此需要定義引數的初始化和增加kernel傳入引數介面
修改nn/_functions/rnn.py相應的引數介面
由於根據不同的RNN種類需要進行不同的處理

1. 在nn/_functions/rnn.py中增加ConvLSTm的具體實現

實現程式碼如下：

# define convolutional LSTM cell
def ConvLSTMCell(input, hidden, weight, bias=None):

    hx, cx = hidden
    combined = torch.cat((input, hx), 1)
    # in this way the output has the same size of input
    padding = (weight.size()[-1] - 1)/2

    gates = F.conv2d(combined, weight, bias=bias, padding=padding)

    ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)

    ingate = F.sigmoid(ingate)
    forgetgate = F.sigmoid(forgetgate)
    cellgate = F.tanh(cellgate)
    outgate = F.sigmoid(outgate)

    cy = (forgetgate * cx) + (ingate * cellgate)
    hy = outgate * F.tanh(cy)

是不是很簡單，只是把之前的線性操作換成了卷積操作，其中F.con2d的引數為：(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)，權重（weight）為：(out_channels, in_channels/groups, kH, kW)，詳見：

2. 在nn/modules/rnn.py中增加ConvLSTM的擴充套件

使通過nn.ConvLSTM()可以呼叫。

class ConvLSTM(RNNBase):
    r"""Applies a convolution multi-layer long short-term memory (ConvLSTM) RNN to an input sequence.
    Examples::

        >>> rnn = nn.LSTM(3, 10, 2, kernel_size=3)
        >>> input = Variable(torch.randn(4, 10, 3, 25, 25))
        >>> h0 = Variable(torch.randn(2, 10, 10, 25, 25))
        >>> c0 = Variable(torch.randn(2, 10, 10, 25, 25))
        >>> output, hn = rnn(input, (h0, c0))
    """
    def __init__(self, *args, **kwargs):
        super(ConvLSTM, self).__init__('ConvLSTM', *args, **kwargs)

和LSTM簡直一模一樣，都是呼叫父類構造其初始化。

3. 在nn/modules/rnn.py中針對ConvLSTM修改RNNBase的初始化和引數傳入

每個RNNBase的子類都通過('mode', *args, **kwargs)的方式傳入引數，因此增加引數時只需要修改其父類的定義即可，因此在最後增加kernel_size的傳入，並且使用utils中的_pair進行初始化(即，from .utils import _pair)：kernel_size = _pair(kernel_size)。

卷積的權重為out_channels × in_channels × kernel_h × kernel_w，偏置為out_channels，通過檢視卷積初始化的原始碼：

convolution source

從原始碼中也可以看到起權重和偏置的組成，並且權重初始化為關於輸入通道乘核大小（in_channels x kernel）的一個分佈，我們再看LSTM的權重初始化：

LSTM source

權重初始化是關於隱層大小（hidden_szie）的一個分佈，因此需要做一些調整。
（PS：從原始碼中也可看出，卷積在reset_parameters給權重賦值函式裡判斷是否給偏置初始化，而LSTM是在init中先判斷是否有偏置引數，在進行初始化，兩種不同編碼風格，應該不是一個人寫的 ^_!）

最後對LSTM的初始化程式碼進行修改，結果如下：
（PS: 由於ConvLSTM在實現是把input和hidden拼在一起進行卷積計算，因此使用一個權重weight和偏置bias表示計算過程）

 def __init__(self, mode, input_size, hidden_size,
                 num_layers=1, bias=True, batch_first=False,
                 dropout=0, bidirectional=False, kernel_size=3):
        super(RNNBase, self).__init__()
        self.kernel_size = kernel_size
        num_directions = 2 if bidirectional else 1
        kernel_size = _pair(kernel_size)
		# init parameters
		self.n = hidden_size

        self._all_weights = []
        for layer in range(num_layers):
            for direction in range(num_directions):
                layer_input_size = input_size if layer == 0 else hidden_size * num_directions
                if mode == 'LSTM':
                    gate_size = 4 * hidden_size
                elif mode == 'ConvLSTM':
                    weight = Parameter(torch.Tensor(4*hidden_size, layer_input_size + hidden_size, *kernel_size))
                    bias = Parameter(torch.Tensor(4*hidden_size))
                    
                    self.n = layer_input_size
                    for k in kernel_size:
	                    self.n *= k
                    
                    suffix = '_reverse' if direction == 1 else ''
                    weights = ['weight_l{}{}', 'bias_l{}{}']
                    weights = [x.format(layer, suffix) for x in weights]
                    setattr(self, weights[0], weight)
                    if bias:
                        setattr(self, weights[1], bias)
                        self._all_weights += [weights]
                    else:
                        self._all_weights += [weights[:1]]
                    continue

這裡只貼了部分程式碼，其他程式碼與原始相同。

同時，也需要修改下面的forward()程式碼：

        if hx is None:
            if self.mode == 'ConvLSTM':
                feature_size = input.size()[-2:]
                num_directions = 2 if self.bidirectional else 1
                hx = torch.autograd.Variable(input.data.new(self.num_layers *
                                                            num_directions,
                                                            max_batch_size,
                                                            self.hidden_size,
                                                            feature_size[0],
                                                            feature_size[1]).zero_())
                hx = (hx, hx)
            else:
                num_directions = 2 if self.bidirectional else 1
                hx = torch.autograd.Variable(input.data.new(self.num_layers *
                                                            num_directions,
                                                            max_batch_size,
                                                            self.hidden_size).zero_())
                if self.mode == 'LSTM':
                    hx = (hx, hx)

4. 修改nn/_functions/rnn.py相應的引數介面

需要針對不同種類的RNN進行處理，主要在nn/_functions/rnn.py的AutogradRNN中增加ConvLSTMCell的呼叫。

5. 最後在nn/modules/init.py和nn/backends/thnn.py中增加宣告

修改thnn.py：

修改init.py：

測試結果

result

最後對於輸入通道為3，隱層通道為10，網路層數為2，卷積核為3的ConvLSTM進行測試，
輸入資料為(4, 10, 3, 25, 25)分別為序列長度、batch_size、輸入通道、圖片尺寸
隱層和cell為(2, 10, 10, 25, 25)分別為網路層、通道數、batch_szie、特徵尺寸。

最後到輸出尺寸為4, 10, 10, 25, 25

（PS：最後，可以看到呼叫網路結構有2種不同的方式，拿卷積來說，有nn.Conv2d和F.conv2d兩種，兩種輸入的引數不同，簡單來講，第一種需要不需要輸入權重引數進行初始化，第二種可以傳入初始化後的權重）

2017/11/20更新
由於使用在實現WGAN-GP時會使用到Higher-order gradients，本來不想更新的PyTorch2也必須更新了，同時也使用了python3.6，程式碼改動較小，主要是權重初始化時的編碼風格改變了需要調整，主要修改如下：

PyTorch(五)——PyTorch原始碼修改之增加ConvLSTM層

PyTorch的學習和使用（五）

PyTorch自帶LSTM實現分析

首先定義一個LSTM，通過斷點的方式理解其函式傳遞方式與順序

測試結果

PyTorch(五)——PyTorch原始碼修改之增加ConvLSTM層

Android6.0 原始碼修改之遮蔽導航欄虛擬按鍵（Home和RecentAPP）/動態顯示和隱藏NavigationBar

Android6.0 原始碼修改之遮蔽系統簡訊功能和來電功能

Mybatis-generator原始碼修改之修改xmlmapper生成原始碼

Android8.1 原始碼修改之通過黑名單遮蔽系統簡訊功能和來電功能

【1】pytorch torchvision原始碼解讀之Alexnet

PyTorch原始碼解讀之torch.utils.data.DataLoader(轉)

PyTorch原始碼解讀之torchvision.models(轉)

PyTorch原始碼解讀之torchvision.transforms（轉）

PyTorch原始碼解讀之torchvision.transforms

PyTorch原始碼分析之torchvision.transforms

PyTorch原始碼分析之torchvision.models

【4】pytorch torchvision原始碼解讀之ResNet

【5】pytorch torchvision原始碼解讀之DenseNet

PyTorch原始碼解讀之torch.utils.data.DataLoader

PyTorch原始碼解讀之torchvision.models

tensorflow原始碼學習之五 -- 同步訓練和非同步訓練

pytorch 預訓練模型修改

JDK原始碼解讀（第五彈：Integer之toString方法）

SpringOauth2.0原始碼分析之儲存（五）

PyTorch(五)——PyTorch原始碼修改之增加ConvLSTM層

PyTorch的學習和使用（五）

PyTorch自帶LSTM實現分析

首先定義一個LSTM，通過斷點的方式理解其函式傳遞方式與順序

測試結果

相關推薦