【Inception-v1】《Going Deeper with Convolutions》
CVPR-2015
1 Background and Motivation
作者的工作很大程度上是受到這兩個工作的啟發的
DNN model size 越大(more depth,more width)效果越好,但是這樣會有兩個 major drawbacks
- 更 prone to overfitting,對資料要求越多(標註成本不低咯,細粒度分類需要更專業的人才能標註)
- 更消耗 computational resources
如下圖
解決以上問題的根本方法就是把 neural network 變得更稀疏,當某個資料集的分佈可以用一個稀疏網路表達的時候就可以通過分析某些啟用值的相關性,將相關度高的神經元聚合,來獲得一個稀疏的表示
[2] suggests a layer-by layer construction where one should analyze the correlation statistics of the last layer and cluster them into groups of units with high correlation.
這種方法也呼應了 Hebbian principle (neurons that fire together, wire together),一個很通俗的現象,先搖鈴鐺,之後給一隻狗餵食,久而久之,狗聽到鈴鐺就會口水連連。這也就是狗的“聽到”鈴鐺的神經元與“控制”流口水的神經元之間的連結被加強了,而Hebbian principle的精確表達就是如果兩個神經元常常同時產生動作電位,或者說同時激動(fire),這兩個神經元之間的連線就會變強,反之則變弱
[2] + Hebbian principle 正是的稀疏結構設計的理論支援
因此作者來了一個聚類(inception),配合 1x1 稀疏,效果傲視群雄,實在佩服
2 Advantages / Contributions
2.1 Advantages
- 12 times fewer parameters than AlexNet (more depth,more width) and more accuracy
- ILSVRC 2014 outperforms the current state of the art(classification and detection challenges)
2.2 Contributions
our approach yields solid evidence that moving to sparser architectures is feasible and useful idea in general.
3 Innovation
Inception 結構的設計,結合 NIN 減少計算量
4 Method
Inception 名字的起源
Deep 有兩層含義
- 網路的深度
- 境界更深(inception module 的形式)
全景圖看部落格最後一節 GoogleNet
輸入: 224*224
inception3:
inception4:
inception5:
table 1 綠色加起來等於output size 的channels eg:256 = 64 + 128 + 32 + 32
-
train:the losses of the auxiliary classifiers(4a、4b——combat the vanishing gradient、providing regularization) were weighted by 0.3
-
inference:these auxiliary networks are discarded
5 Dataset
-
ILSVRC 2014 Classification Challenge(1000類)
- training: about 1.2 million
- validation:50,000
- testing:100,000
-
ILSVRC 2014 Detection Challenge(200類)
6 Experiments
6.1 ILSVRC 2014 Classification Challenge
致敬 “OG”
ensemble(crops 是在 test上)
6.2 ILSVRC 2014 Detection Challenge
致敬 “OG”
1v1 battle(GoogleNet 團戰(ensemble) 作用比 Deep Insight 大,solo 被壓制)
7 Conclusion / Future work
Still, our approach yields solid evidence that moving to sparser architectures is feasible and useful idea in general.
Q1:第7節中 V/S 的含義 (valid 和 same)
Q2:Table 5 中 Contextual model 的含義
Q3:為啥要在 3×3 和 5×5 之前用 1×1,而在 max pooling 之後用 1×1(感覺前面是模仿 NIN,後面單純的是為了減少計算量)
Q4:pool proj 是什麼?
8 GoogleNet
auxiliary classifiers 的使用
-
train:the losses of the auxiliary classifiers(4a、4b) were weighted by 0.3
-
inference:these auxiliary networks are discarded
auxiliary classifiers 的作用
- combat the vanishing gradient(加速收斂)
- providing regularization(我的理解是,正則化就是防過擬合,類似要求題的結果正確,步驟也要正確的感覺,也可以理解為高層的特徵趨向於擬合複雜的結構,底層的特徵趨向於擬合簡單的結構,我們的資料有複雜也有簡單的結構,auxiliary classifiers 接在低層是網路不傾向於複雜的結構)
Inception v3 中關於 auxiliary classifiers 的觀點如下:
-
The original motivation was to push useful gradients to the lower layers to make them immediately useful and improve the convergence during training by combating the vanishing gradient problem in very deep networks. 實驗表明,並不能加速 converge,只是在訓練快結束的有時候,有比沒有精度會高一點點
-
these branches help evolving the low-level features is most likely misplaced. Instead, we argue that the auxiliary classifiers act as regularizer. This is supported by the fact that the main classifier of the network performs better if the side branch is batch-normalized or has a dropout layer.
v3作者想表達 GoogleNet 中的 auxiliary classifiers 並沒有太多 combat Gradient Vannishing 的功能(並不能加快收斂),更像是 regularizer。