06 ResNeXt論文閱讀筆記

阿新 • • 發佈：2018-11-19

1. 提出背景
2. 核心思想
3. 論文核心
4. 組卷積
5. 核心程式碼

論文: Aggregated Residual Transformations for Deep Neural Networks

論文地址: https://arxiv.org/abs/1611.05431

程式碼地址

Keras: https://github.com/titu1994/Keras-ResNeXt
Pytorch: https://github.com/prlz77/ResNeXt.pytorch/tree/R4.0[推薦,用於cifar資料集]

參考部落格:

ResNeXt - Aggregated Residual Transformations for DNN

ResNeXt演算法詳解

ResNext與Xception——對模型的新思考

1. 提出背景

作者提出ResNeXt的主要原因在於:

傳統的要提高模型的準確率，都是通過加深或加寬網路，但是隨著超引數數量的增加(比如 channels數，filter size等等)，網路設計的難度和計算開銷也會增加。

因此本文提出的 ResNeXt結構可以在不增加引數複雜度的前提下

提高準確率
減少超引數數量(得益於子模組的拓撲結構)

2. 核心思想

作者在論文中首先提到VGG，VGG採用 堆疊網路 來實現，之前的 ResNet 也借用了這樣的思想。
之後提到了Inception系列網路，簡單說就是 split-transform-merge 的策略，但是存在一個問題:

網路的超引數設定的針對性比較強，當應用在別的資料集上需要修改許多引數，因此可擴充套件性一般.

作者同時採用 VGG 的 堆疊思想 和 Inception 的 split-transform-merge 的思想，但是 可擴充套件性比較強. 可以認為在增加準確率的同時基本不改變或降低模型的複雜度。

這裡提到一個名詞cardinality,原文的解釋是 the size of the set of transformations,如下圖 Fig1 右邊是 cardinality=32 的樣子:

引數計算

假設在不使用偏置的情況下:

# A block of ResNet
256x1x64 + 64x3x3x64 + 64x1x256 = 69632
# A block of ResNeXt with cardinality
(256x1x4 + 4x4x3x3 + 4x256) x 32 = 70144

兩者引數數量差不多，但是後面作者有更加精妙的實現。

注意:

每個被聚合的拓撲結構都是一樣的(這也是和 Inception 的差別，減輕設計負擔)

附上原文比較核心的一句話，點明瞭增加 cardinality 比增加深度和寬度更有效，這句話的實驗結果在後面有展示：

In particular, a 101-layer ResNeXt is able to achieve better accuracy than ResNet-200 but has only 50% complexity.

Table1 列舉了 ResNet-50 和 ResNeXt-50 的內部結構，另外最後兩行說明二者之間的引數複雜度差別不大。

3. 論文核心

作者要開始講本文提出的新的 block，舉全連線層（Inner product）的例子來講，我們知道全連線層的就是以下這個公式：

再配上這個圖就更容易理解其splitting，transforming和aggregating的過程。

作者將其中的 $w_{i} x_{i}$ 替換乘了更一般的函式，這裡用了一個很形象的詞:Network in Neuron，式子如下：

其中C就是 cardinality
$T_{I}$ 有相同的拓撲結構（本文中就是三個卷積層的堆疊）

然後再看fig 3，這裡作者展示了3種不同不同的 ResNeXt blocks:

fig3.a

就是前面所說的aggregated residual transformations
fig3.b

則採用兩層卷積後 concatenate，再卷積，有點類似 Inception-ResNet，只不過這裡的 paths 都是相同的拓撲結構
fig 3.c

採用了一種更加精妙的實現，Group convolution分組卷積

作者在文中明確說明這三種結構是嚴格等價的，並且用這三個結構做出來的結果一模一樣，在本文中展示的是fig3.c的結果，因為fig3.c的結構比較簡潔而且速度更快.

4. 組卷積

Group convolution 分組卷積，最早在AlexNet中出現，由於當時的硬體資源有限，訓練AlexNet時卷積操作不能全部放在同一個GPU處理，因此作者把feature maps分給多個GPU分別進行處理，最後把多個GPU的結果進行融合。

有趣的是，分組卷積在當時可以說是一種工程上的妥協，因為今天能夠簡單訓練的AlexNet在當時很難訓練, 視訊記憶體不夠，Hinton跟他的學生不得不把網路拆分到兩張GTX590上面訓練了一個禮拜，當然，兩張GPU之間如何通訊是相當複雜的，幸運的是今天tensorflow這些庫幫我們做好了多GPU訓練的通訊問題。就這樣Hinton和他的學生髮明瞭分組卷積. 另他們沒想到的是:

分組卷積的思想影響比較深遠，當前一些輕量級的SOTA（State Of The Art）網路，都用到了分組卷積的操作，以節省計算量。

疑問

如果分組卷積是分在不同GPU上的話，每個GPU的計算量就降低到 1/groups，但如果依然在同一個GPU上計算，最終整體的計算量是否不變？

實際上並不是這樣的，Group convolution本身就大大減少了引數，比如當input_channel=256, output_channel=256,kernel size=3x3:

不做分組卷積的時候，分組卷積的引數為256x256x3x3
當分組卷積的時候，比如說group=2,每個group的input_channel、output_channel=128,引數數量為2x128x128x3x3,為原來的1/2.

最後輸出的feature maps通過concatenate的方式組合，而不是elementwise add. 如果放到兩張GPU上運算，那麼速度就提升了4倍.

5. 核心程式碼

import torch.nn as nn
import torch.nn.functional as F
from torch.nn import init


class ResNeXtBottleneck(nn.Module):
    r"""RexNeXt bottleneck type C
    https://github.com/facebookresearch/ResNeXt/blob/master/models/resnext.lua
    """
    def __init__(self, in_channels, out_channels, stride, cardinality, base_width, widen_factor):
        """

        Args:
            in_channels (int): input channel dimensionality
            out_channels (int): output channel dimensionality
            stride: Replaces pooling layer.
            cardinality: num of convolution groups.
            base_width: base number of channels in each group.
            widen_factor: factor to reduce the input dimensionality before convolution.

        """
        super().__init__()
        self.widel_ratio = out_channels / (widen_factor * 64.)
        self.D = cardinality * int(base_width * self.widel_ratio)

        # 縮減的卷積層
        self.conv_reduce = nn.Conv2d(in_channels=in_channels,
                                     out_channels=self.D,
                                     kernel_size=1,
                                     stride=1,
                                     padding=0,
                                     bias=False)
        self.bn_reduce = nn.BatchNorm2d(self.D)

        # 組卷積
        self.conv_conv = nn.Conv2d(self.D, self.D, 3, stride, 1, groups=cardinality, bias=False)
        self.bn = nn.BatchNorm2d(self.D)

        # 增加的卷積層
        self.conv_expand = nn.Conv2d(self.D, out_channels, 1, 1, 0, bias=False)
        self.bn_expand = nn.BatchNorm2d(out_channels)

        # 短接的層
        self.shortcut = nn.Sequential()
        # 如果是兩個模組拼接，則
        if in_channels != out_channels:
            self.shortcut.add_module(name='shortcut_conv',
                                     module=nn.Conv2d(in_channels,
                                                      out_channels,
                                                      kernel_size=1,
                                                      stride=stride,
                                                      padding=0,
                                                      bias=False))
            self.shortcut.add_module(name='shortcut_bn',
                                     module=nn.BatchNorm2d(out_channels))

    def forward(self, x):
        bottleneck = self.conv_reduce.forward(x)
        bottleneck = F.relu(self.bn_reduce.forward(bottleneck), inplace=True)

        bottleneck = self.conv_conv.forward(bottleneck)
        bottleneck = F.relu(self.bn.forward(bottleneck), inplace=True)

        bottleneck = self.conv_expand.forward(bottleneck)
        bottleneck = self.bn_expand.forward(bottleneck)

        # 如果輸入通道數量和輸出通道數量相等，則為直接短接
        # 如果不相等,短接之前還要做一個卷積操作,將通道數量擴充套件
        residual = self.shortcut.forward(x)
        return F.relu(input=(residual + bottleneck), inplace=True)


class CifarResNeXt(nn.Module):
    def __init__(self, cardinality, depth, nlabels, base_width, widen_factor=4):
        """Constructor

        Args:
            cardinality: number of convolution groups.
            depth: number of layers.
            nlabels: number of classes
            base_width: base number of channels in each group.
            widen_factor: factor to adjust the channel dimensionality

        """
        super().__init__()
        self.cardinality = cardinality
        self.depth = depth
        self.block_depth = (self.depth - 2) // 9
        self.base_width = base_width
        self.widen_factor = widen_factor
        self.nlabels = nlabels
        self.output_size = 64
        self.stages = [64, 64*self.widen_factor, 128*self.widen_factor, 256*self.widen_factor]

        self.conv_1_3x3 = nn.Conv2d(in_channels=3,
                                    out_channels=64,
                                    kernel_size=3,
                                    stride=1,
                                    padding=1,
                                    bias=False)
        self.bn_1 = nn.BatchNorm2d(64)

        self.stage_1 = self.block('stage_1',
                                  in_channels=self.stages[0],
                                  out_channels=self.stages[1],
                                  pool_stride=1)
        self.stage_2 = self.block('stage_2', self.stages[1], self.stages[2], 2)
        self.stage_3 = self.block('stage_3', self.stages[2], self[3], 2)

        self.classifier = nn.Linear(in_features=self.stages[3], out_features=nlabels)

        self.initialize_weights() # 初始化權重

    def initialize_weights(self):
        init.kaiming_normal(self.classifier.weight) # 用kaiming初始化classifier
        for key in self.state_dict():
            if key.split('.')[-1] == 'weight':
                if 'conv' in key:
                    init.kaiming_normal(self.state_dict()[key], mode='fan_out')
                if 'bn' in key:
                    self.state_dict()[key][...] = 1
            elif key.split('.')[-1] == 'bias':
                self.state_dict()[key][...] = 0

    def block(self, name, in_channels, out_channels, pool_stride=2):
        """Stack n bottleneck modules where n is inferred from the depth of the network.

        Args:
            name: string name of the current block.
            in_channels: number of input channels
            out_channels: number of output channels
            pool_stride: factor to reduce the spatial dimensionality in the first bottleneck of the block.

        Returns:
            a Module consisting of n sequential bottlenecks.

        """
        block = nn.Sequential()
        for bottleneck in range(self.block_depth):
            name_ = '%s_bottleneck_%d' % (name, bottleneck)
            if bottleneck == 0:
                block.add_module(name_, module=ResNeXtBottleneck(in_channels,
                                                                 out_channels,
                                                                 stride=pool_stride,
                                                                 cardinality=self.cardinality,
                                                                 base_width=self.base_width,
                                                                 widen_factor=self.widen_factor))
            else:
                block.add_module(name_, module=ResNeXtBottleneck(out_channels,
                                                                 out_channels,
                                                                 1,
                                                                 self.cardinality,
                                                                 self.base_width,
                                                                 self.widen_factor))
        return block

    def forward(self, x):
        x = self.conv_1_3x3.forward(x)
        x = F.relu(self.bn_1.forward(x), inplace=True)
        x = self.stage_1.forward(x)
        x = self.stage_2.forward(x)
        x = self.stage_3.forward(x)
        x = F.avg_pool2d(input=x, kernel_size=8, stride=1)
        x = x.view(-1, self.stages[3])
        return self.classifier(x)

06 ResNeXt論文閱讀筆記

1. 提出背景

2. 核心思想

3. 論文核心

4. 組卷積

5. 核心程式碼

附件列表

06 ResNeXt論文閱讀筆記

ResNeXt論文閱讀筆記.md

2017-06-Deep Network Flow for Multi-Object Tracking-論文閱讀筆記

關鍵字抽取論文閱讀筆記

《The challenge of realistic music generation: modelling raw audio at scale》論文閱讀筆記

《Macro-Micro Adversarial Network for Human Parsing》論文閱讀筆記

論文閱讀筆記《The Contextual Loss for Image Transformationwith Non-Aligned Data》（ECCV2018 oral）

論文閱讀筆記（一）LeNet--Gradient-Based Learning Applied to Document Recognition

論文閱讀筆記（十一）Network In Network

論文閱讀筆記（六）Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

論文閱讀筆記 DeepLabv1：SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED CRFS

論文閱讀筆記（九）YOLOv3: An Incremental Improvement

DensePose:Dense Human Pose Estimation In The Wild 論文閱讀筆記

《Self-Protection of Android Systems from Inter-component Communication Attacks》論文閱讀筆記

《System Service Call-oriented Symbolic Execution of Android Framework with Applications to...》論文閱讀筆記

《AppIntent - Analyzing Sensitive Data Transmission in Android for Privacy Leakage Detection》論文閱讀筆記

【論文閱讀筆記3】序列模型入門之LSTM和GRU

ECCV 2018 論文閱讀筆記——Acquisition of Localization Confidence for Accurate Object Detection

【論文閱讀筆記】Deep Learning based Recommender System: A Survey and New Perspectives

論文閱讀筆記：《Contextual String Embeddings for Sequence Labeling》

06 ResNeXt論文閱讀筆記

1. 提出背景

2. 核心思想

3. 論文核心

4. 組卷積

5. 核心程式碼

附件列表

相關推薦