Rethinking the Inception Architecture for Computer Vision-Inception v2 v3詳細解讀

阿新 • • 發佈：2020-10-10

作者：18屆CYL

日期：2020.9.3

期刊： 2015-CVPR

標籤： Inception v2 v3

《Rethinking the Inception Architecture for Computer Vision》

一、寫在前面的話：若有差錯歡迎指正

背景知識區：
1、Inception v1架構的各種設計決策的貢獻因子並沒有明確表述。而儘管直觀的看inception架構是由inception模組堆疊而成，看起來網路結構好像不是很複雜，但是相對於其他如VGG這樣的網路還是複雜了不少。（這裡的複雜不是從引數的規模上評估，而是表示資料在層與層之間的傳遞複雜）。一個又複雜又不知道效果為什麼這麼好的網路在面對一個新的資料集時是不容易進行相應的更改來適應的。

2、不容易不代表不可以，在通用設計原則的指導下還是採用了一系列手段來組合成新的模型結構。

3、原則1：避免表徵瓶頸。由於圖片資料是在網路中正向傳播的，經過一層又一層的濾波器提取特徵。因為在高層是想要提取區域性特徵，所以一般來說，從輸入到輸出，特徵影象會越來越小，同時在這個越來越小的過程中會當然會丟失掉一些重要的資訊，如果減小的太快或者是最終特徵影象太小就難免造成表徵瓶頸。此原則在GoogLeNet中遵守，在Inception v2中繼續遵守。

4、原則2：特徵越多，收斂越快。這裡的特徵多不是說特徵影象多，而是說相互獨立的特徵多，特徵分解的徹底。比如：人臉特徵分解成人臉、人左眼、人右眼、鼻子、嘴巴、眉毛等等獨立特徵會比單純的一張大臉特徵收斂的快。（赫布原理）

5、原則3、大卷積核的卷積之前可以先降維，這個操作並不會造成太多的損失。例如：在33卷積核55卷積之前用1*1卷積進行壓縮降維是可行的。因為具有強相關性的一組資料在降維過程中資訊損失很少，而每一個feature map的同一個位置的畫素對應的原輸入的位置是相同的，所以這些畫素是高度相關的。

6、原則4、均衡網路的寬度和深度。即兩者同時適當提升，不要顧此失彼。

7、常用的模型
在這裡插入圖片描述

二、2、 Inceprion v2 v3相對於v1做的改進（建議對照著4個原則進行理解，同時比對v1的結構）

1、把大卷積核分解成數個小卷積核（原則3）

思考：參照v1，損失少，大大降低計算量，增加非線性，跨通道交流的1*1卷積起了大作用。那麼還有沒有方法也能起到類似效果呢。

策略：
1.1： 55的卷積換成兩個33卷積（感受野相同）同理77卷積可以用3個33卷積
在這裡插入圖片描述
問題：這麼著分解可行嗎？在分成兩層之後中間要不要加非線性的relu啟用函式？
對於第一小問：直觀的看是可行的，從結果看也是可行的。但是要問嚴謹的數學原理，確實難以解釋。

對於第二小問：
在這裡插入圖片描述
此圖表示非線性會比線性啟用好。（即使相差很近，但是Top-1這個標準比的就是1%的提升）

1.2：不對稱卷積（空間可分離卷積），33卷積換成兩個31卷積。
在這裡插入圖片描述
經過驗證：不對稱卷積的分解比較適用於feature map大小為（12-20）的。（我認為可能的原因是：層數靠前的話，提取出來的特徵不明顯，畫素之間的相關性還沒有那麼大，所以用不對稱分解造成的資訊丟失會比較大）

2、輔助分類器的改進（好像沒有對應上面的原則）

在GoogLeNet裡面用了兩個輔助分類器（4a和4b兩個模組後面），但是事後實驗證明，第一個沒什麼用，在v2，v3裡面去掉了。（這個的作者與GoogLeNet的作者是一個作者，一個敢於指出自己問題的大佬是值得尊敬的。大名：Christian Szegegy）

3、有效地降低網格尺寸（特徵影象寬度），有效的下采樣技巧。（原則1）

為了避免池化後尺寸直接變為1/2可能帶來的表徵瓶頸，這裡的策略是讓：
ddk->d/2d/22k 也就是在長寬減半的同時厚度變為2倍來防止資訊丟失太多。

問題：實現這種結果的操作可以有：

先用2k個步長為1的卷積核卷積，再用步長為2的池化層池化。
先池化再卷積。

而先卷積再池化計算量超大（計算量集中在卷積層），不划算。先池化再卷積又會造成在池化的時候資訊已經丟了，再增加厚度好像沒什麼用了。那麼如何解決這個問題。

聰明的作者給出了他的做法：左圖為操作，右圖為特徵影象的變化過程
在這裡插入圖片描述

4、擴充套件濾波器組（原則2）

在這裡插入圖片描述
這個模組是用在分類之前，可以將大特徵拆成小特徵。（讓特徵變多）

5、將上面的策略彙總提出inception v2（原則4）

在這裡插入圖片描述
注：figure5 是把55用兩個33代替
figure6 是把分解不對稱卷積
figure7 是擴充套件濾波器組

總體看，深度和寬度是均衡的，兩者同時提升，即可以提高效能，也提高了計算效率。層數為42層，計算量為GoogLeNet的2.5倍，但仍比VGG高效的多。
在這裡插入圖片描述

6、使用Label Smooth替代SoftMax進行預測分類。

在這裡插入圖片描述

三、Incetion v3來源於結果

在這裡插入圖片描述
這個表中表示在v2這個模型結構的一些策略選擇。其中最後一行的另一個名字叫做Inception v3。

四、思考：（丟擲小問題）

根據四個原則在v1的基礎之上做了相應的改變形成Inception v2、v3,結果看起來確實有了不錯的提升。我們是否可以理所當然的認為，在inception這個整體框架不變的情況下，對inception module進行小修小補，如在模組上增加7*7卷積作為第五個通道，增加模組數量來增加網路深度，增加殘差策略來增快迭代速度、拓寬網路深度等操作，這些情況下經過適量調參當然可以獲得更好的結果。（在我的理解中事實上v4與Inception-ResNet就是在調參與策略選擇），也就是因為實驗上的不錯效果讓從v1到v4的數個模型被框在了inception的架構裡面。但是v1裡面提出的主要問題：如何既利用了卷積的稀疏性又通過一定的聚類效果來得到稠密矩陣來加速訓練？當時嘗試給出的解決方案便是Inception module利用並行的不同大小的卷積核來可能實現一定的聚類效果。當時作者也有表示：這個網路結構實現這麼好的效果需要進一步的研究來探索到底是為什麼。但是後來是否真的可以實現一定的聚類效果、正確率的提升是否與一定的聚類效果有關，這個網路結構到底為什麼會得到更好的正確率再也沒有被回答過。整體的模型改進與優化是否是拋棄了追求數學上的尋找可解釋性（即尋找數學因果關係）而轉向了工程上的調參？神經網路這個黑匣子是不是更黑了？

Rethinking the Inception Architecture for Computer Vision-Inception v2 v3詳細解讀

作者：18屆CYL 日期：2020.9.3 期刊： 2015-CVPR 標籤： Inception v2 v3 《Rethinking the Inception Architecture for Computer Vision》

論文閱讀：Rethinking the Inception Architecture for Computer Vision

論文閱讀：Rethinking the Inception Architecture for Computer Vision 這篇論文是Inception 結構的一次改進，作者Christian Szegedy基於Inception提出了進一步提升卷積網路效能的幾點原則，並結合實驗對所

The understand of modular Multimodal Architecture for Document Classifification

一、Text Extraction the main way: We utilize the open source16 Tesseract OCR engine17 to extract text from all images in the RVL-CDIP dataset.We use the the combined legacy/LSTM engine (oem 3

The Four Types of Enterprise Architecture Framework: Which Is the Best Type For You?

https://architecture-center.com/blog/112-the-four-types-of-enterprise-architecture-framework-which-is-the-best-type-for-you.html

Vision MLP 之 S2-MLP V1&V2 : Spatial-Shift MLP Architecture for Vision

Vision MLP 之 S2-MLP V1&V2 : Spatial-Shift MLP Architecture for Vision 原始文件：https://www.yuque.com/lart/papers/dgdu2b

【論文筆記】Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

地址：https://arxiv.org/pdf/2006.11538.pdf github：https://github.com/iduta/pyconv 目前的卷積神經網路普遍使用3×3的卷積神經網路，通過堆疊3×3的卷積核和下采樣層，會在減少影象的大小的同時增加

.net Core 在 CentOS8下,報The type initializer for 'Gdip' threw an exception.異常

.net Core允許在 Centos7 上，使用 System.Draw.Common類庫時，報以下錯誤： \"Class\":\"System.TypeInitializationException\",

變老 - 6 - A Style-Based Generator Architecture for Generative Adversarial Networks（StyleGAN）- 論文學習

https://github.com/NVlabs/stylegan2 A Style-Based Generator Architecture for Generative Adversarial Networks

成功解決MSB8020 The build tools for v141 (Platform Toolset = ‘v141‘) cannot be found. To build using the

成功解決MSB8020 The build tools for v141 (Platform Toolset = \'v141\') cannot be found. To build using the

解決redis.exceptions.ResponseError異常:Please check the Redis logs for details about the RDB error

今天要解決的問題主要有兩部分：Redis的快照持久化ERROR，還有伺服器磁碟不夠的異常。

(Mac Android Studio)Unable to connect to ADB.Check the Event Log for possible issues.Verify that you

我下了個Pixel XL API 27的模擬器，再執行程式就出現了下面的問題 Unable to connect to ADB.Check the Event Log for possible issues.Verify that your localhost entry is pointing to 127.0.0.1 or ::

論文閱讀筆記：《SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation》

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

目錄問題環境配置解決過程總結問題在用pytorch跑生成對抗網路的時候，出現錯誤Runtime Error: one of the variables needed for gradient computation has been modified by an inplace operation，特記錄排坑記錄

The Montgomery ladder for x-coordinate-based scalar multiplication

//https://www.researchgate.net/publication/277940984_High-speed_Curve25519_on_8-bit_16-bit_and_32-bit_microcontrollers

Transformer in Computer Vision

Transformer in Computer Vision 2020-12-0319:18:25 1.Attention is all you need[J]. NIPS-2017.[Paper] [Code]

Android studio報錯:The emulator process for AVD (xxx) was killed

背景：重做系統後重新配置Android studio 安裝虛擬機器後無法啟動 log中顯示為啟動AVD的程序被殺

Entity Framework Core The same thing can be achieved by explicitly specifying the column type. For example, if the entity type is defined like so:- 建立並配置模型-值轉換器 Value Conversions

值轉換　　值轉換器允許在從資料庫讀取或寫入資料庫時轉換屬性值。這種轉換可以從一個值轉換到另一個相同型別的值(例如，加密字串)，也可以從一種型別的值轉換到另一種型別的值(例如，將列舉值轉換為資料庫中的字串

Rethinking the Inception Architecture for Computer Vision-Inception v2 v3詳細解讀

作者：18屆CYL

日期：2020.9.3

期刊： 2015-CVPR

標籤： Inception v2 v3

《Rethinking the Inception Architecture for Computer Vision》

一、寫在前面的話：若有差錯歡迎指正

二、2、 Inceprion v2 v3相對於v1做的改進（建議對照著4個原則進行理解，同時比對v1的結構）

1、把大卷積核分解成數個小卷積核（原則3）

2、輔助分類器的改進（好像沒有對應上面的原則）

3、有效地降低網格尺寸（特徵影象寬度），有效的下采樣技巧。（原則1）

4、擴充套件濾波器組（原則2）

5、將上面的策略彙總提出inception v2（原則4）

6、使用Label Smooth替代SoftMax進行預測分類。

三、Incetion v3來源於結果

四、思考：（丟擲小問題）

Rethinking the Inception Architecture for Computer Vision-Inception v2 v3詳細解讀

論文閱讀：Rethinking the Inception Architecture for Computer Vision

The understand of modular Multimodal Architecture for Document Classifification

The Four Types of Enterprise Architecture Framework: Which Is the Best Type For You?

Vision MLP 之 S2-MLP V1&V2 : Spatial-Shift MLP Architecture for Vision

【論文筆記】Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

.net Core 在 CentOS8下,報The type initializer for 'Gdip' threw an exception.異常

變老 - 6 - A Style-Based Generator Architecture for Generative Adversarial Networks（StyleGAN）- 論文學習

成功解決MSB8020 The build tools for v141 (Platform Toolset = ‘v141‘) cannot be found. To build using the

解決redis.exceptions.ResponseError異常:Please check the Redis logs for details about the RDB error

(Mac Android Studio)Unable to connect to ADB.Check the Event Log for possible issues.Verify that you

論文閱讀筆記：《SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation》

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

The Montgomery ladder for x-coordinate-based scalar multiplication

Transformer in Computer Vision

Android studio報錯:The emulator process for AVD (xxx) was killed

Entity Framework Core The same thing can be achieved by explicitly specifying the column type. For example, if the entity type is defined like so:- 建立並配置模型-值轉換器 Value Conversions

the “scope“ attribute for scoped slots have been deprecated and replaced by “slot-scope“

did you register the component correctly? For recursive components, make sure to provide the “name“

vue 父子元件傳陣列eslint報錯(Type of the default value for ‘arrNew‘ prop must be a function)

Rethinking the Inception Architecture for Computer Vision-Inception v2 v3詳細解讀

作者：18屆CYL

日期：2020.9.3

期刊： 2015-CVPR

標籤： Inception v2 v3

《Rethinking the Inception Architecture for Computer Vision》

一、寫在前面的話：若有差錯 歡迎指正

二、2、 Inceprion v2 v3相對於v1做的改進（建議對照著4個原則進行理解，同時比對v1的結構）

1、 把大卷積核分解成數個小卷積核（原則3）

2、輔助分類器的改進（好像沒有對應上面的原則）

3、有效地降低網格尺寸（特徵影象寬度），有效的下采樣技巧。（原則1）

4、擴充套件濾波器組（原則2）

5、將上面的策略彙總提出inception v2（原則4）

6、使用Label Smooth替代SoftMax進行預測分類。

三、Incetion v3來源於結果

四、思考：（丟擲小問題）

相關推薦

一、寫在前面的話：若有差錯歡迎指正

1、把大卷積核分解成數個小卷積核（原則3）