1. 程式人生 > >ResNet && DenseNet(原理篇)

ResNet && DenseNet(原理篇)

本篇部落格不講論文的內容,只講主要思想和我自己的理解,細節問題請自行看論文

Introduction

When it comes to neural network design, the trend in the past few years has pointed in one direction: deeper. 但是問題是:

Is learning better networks as easy as stacking more layers ??

讓我們看看在ImageNet上分類winner的網路的深度:

Depth in ImageNet

是不是我們通過簡單的stack的方式把網路的深度增加就可以提高performance?? 答案是NO,存在兩個原因

  • vanishing/exploding gradients
  • degradation problem

Residual

其實思想很簡單:

Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these layers fit a residual mapping. Formally, denoting the desired underlying mapping as H(x),we let the stacked nonlinear layers fit another mapping of F(x): H(x)-x. The original mapping is recast into F(x)+x.

那麼學習到的F(x)就是殘差.

Shortcut Connections

思想起源於HighWay Nets,shortcut的好處是:

a few intermediate layers are directly connected to auxiliary classifiers for addressing vanishing/exploding gradients.

通過shortcut的方式(Residual)進行stack的nets(ResNet),可以在加深layers上獲得更好的效果

Residual

對比在ImageNet上的效果:

再來個表格對比,更加明顯:

Result_ImageNet

DenseNet

一個詞概括網路的結構特點就是Dense

,一句話概括的話:

For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers.

結構如下所示:

DemseNet

和ResNet相比,最大的區別在於:

Never combine features through summation before they are passed into a layer, instead we provide them all as separate inputs.

對於此網路來說,很明顯number of connections適合depth成平方的關係,所以問題是當depth很大的時候是不是已經無法訓練了?? 作者是這麼說的:

Although the number of connections grows quadratically with depth, the topology encourages heavy feature reuse.

對比ResNet來說:

Prior work has shown that there is great redundancy within the feature maps of the individual layers in ResNets. In DenseNets, all layers have direct access to every feature map from all preceding layers, which means that there is no need to re-learn redundant feature maps. Consequently, DenseNet layers are very narrow (on the order of 12 feature maps per layer) and only add a small set of feature maps to the “collective knowledge” of the whole network.

在Cifar 10等上做分類的網路模型是:

DenseNet with Cls

結果:

Result_DenseNet

Conclusion

其實無論是ResNet還是DenseNet,核心的思想都是HighWay Nets的思想: 就是skip connection,對於某些的輸入不加選擇的讓其進入之後的layer(skip),從而實現資訊流的整合,避免了資訊在層間傳遞的丟失和梯度消失的問題(還抑制了某些噪聲的產生).