【DMCP】2020-CVPR-DMCP Differentiable Markov Channel Pruning for Neural Networks-論文閱讀

阿新 • • 發佈：2020-06-22

DMCP

2020-CVPR-DMCP Differentiable Markov Channel Pruning for Neural Networks

Shaopeng Guo（sensetime 商湯）
GitHub： 64 stars
https://github.com/zx55/dmcp

Introduction

propose a novel differentiable channel pruning method named Differentiable Markov Channel Pruning (DMCP) to perform efficient optimal sub-structure searching.

本文提出DMCP（可微分的通道剪枝）來高效地搜尋子空間。

At the same FLOPs, our method outperforms all the other pruning methods both on MobileNetV2 and ResNet, as shown in Figure 1.

With our method, MobileNetV2 has 0.1% accuracy drop with 30% FLOPs reduction and the FLOPs of ResNet-50 is reduced by 44% with only 0.4% drop.

Motivation

Recent works imply that the channel pruning can be regarded as searching optimal sub-structure from unpruned networks.

通道的修剪可以視為從未修剪的網路中搜索最佳的子結構（網路剪枝得到的子結構比繼承的權重更重要）

However, existing works based on this observation require training and evaluating a large number of structures, which limits their application.

之前的工作需要訓練和評估很多子結構，開銷大

Conventional channel pruning methods mainly rely on the human-designed paradigm.

卷積網路的剪枝主要依靠手工設計的正規化（重要性指標）

the structure of the pruned model is the key of determining the performance of a pruned model, rather than the inherited “important” weights.

剪枝後網路的結構對效能的影響更大，而不是所繼承的”重要“權重

the optimization of these pruning process need to train and evaluate a large number of structures sampled from the unpruned network, thus the scalability of these methods is limited.

之前的（搜尋子結構）的剪枝方法需要訓練和評估大量的網路，因此可擴充套件性（修剪不同大小的網路）受到限制

A similar problem in neural architecture search (NAS) has been tackled by differentiable method DARTS

在NAS中也有類似的問題，已經被可微分方法DARTS解決了

ps 與DATRS的區別

First, the definition of search space is different. The search space of DARTS is a category of pre-defined operations (convolution, max-pooing, etc), while in the channel pruning, the search space is the number of channels in each layer.

第一，搜尋空間的不同。DARTS的搜尋空間是一些預定義的操作，而我們的搜尋空間是不同層通道的數量

Second, the operations in DARTS are independent with each other. But in the channel pruning, if a layer has k + 1 channels, it must have at least k channels first, which has a logical implication relationship.

第二，DARTS中的操作時互相獨立的（比如兩個node的連線之間的不同操作，卷積，池化，互不影響），但通道剪枝中，如果一層有k+1個通道，那麼它首先要有k個通道。

Contribution

Our method makes the channel pruning differentiable by modeling it as a Markov process.

我們通過將模型剪枝建模為馬爾科夫過程，從而使之可以微分

Method

Our method is differentiable and can be directly optimized by gradient descent with respect to standard task loss and budget regularization (e.g. FLOPs constraint).

DMCP中，我們將通道剪枝視為markov（馬爾科夫）過程，剪枝中的markov狀態（state）代表是否保留相應的通道，狀態之間的轉移視為剪枝的過程

In the Markov process for each layer, the state \(S_k\) represents the \(k^{th}\) channel is retained, the transition from \(S_k\) to \(S_{k+1}\) represents the probability of retaining the (k+1)th channel given that the kth channel is retained.

每一層為一個馬爾科夫過程，狀態 \(S_k\) 表示保留第k個通道。狀態 \(S_k\) 到 \(S_{k+1}\) 的轉移代表保留第k+1個通道的概率

Note that the start state is always \(S_1\) in our method.

\(S_1\) 是起始狀態，即每層都至少有1個通道

Then the marginal probability for state \(S_k\), i.e. the probability of retaining \(k^{th}\) channel, can be computed by the product of transition probabilities and can also be viewed as a scaling coefficient.

因此，第k個狀態（保留第k個通道）的邊緣概率=之前所有轉移概率的乘積，可以視為第k個通道的放大係數

Each scaling coefficient is multiplied to its corresponding channel’s feature map during the network forwarding.

前向過程中，每個通道的feature map 乘以該通道對應的邊緣概率（放大係數）

So the transition probabilities parameterized by learnable parameters can be optimized in an end-to-end manner by gradient descent with respect to task loss together with budget regularization (e.g. FLOPs constraint).

因此可以通過對目標loss 和代價loss（FLOPs loss）的梯度下降，來end to end地優化不同層，不同通道的轉移概率

After the optimization, the model within desired budgets can be sampled by the Markov process with learned transition probabilities and will be trained from scratch to achieve high performance.

優化完成後（即網路中每一層的轉移概率/邊緣概率可以抽樣出符合FLOPs限制的網路了），進行取樣子網路並從頭開始訓練

因此，DMCP選擇將剪枝的過程建模為一個馬爾科夫模型。圖二展示了一層通道數為5的卷積層的剪枝過程。其中S1表示保留第一個通道，S2表示保留第二個通道，以此類推。T表示剪枝完畢。概率p則為轉移概率，通過可學習的引數計算得到，後文中會詳細介紹。

(1)優化剪枝空間

在傳統的剪枝方法中，會為每個通道計算“重要性”來決定是否保留它。而當我們把模型剪枝看作模型結構搜尋問題後，不同模型的區別則在於每一層的通道數量。如果仍然每個通道單獨判斷，就會產生同樣的結構，造成優化困難。如圖三所示：情況1中，最後兩個通道被剪掉，情況2中，第2個和第4個通道被剪掉，而這兩種情況都會產生3個通道的卷積層，使剪枝空間遠大於實際網路個數。

因此，DMCP採用保留前k個通道的方式，大大縮小了剪枝空間。

(2)建模剪枝過程

其中pk為馬爾科夫模型中的轉移概率。這樣，通過在優化完畢後的馬爾可夫模型上取樣就可以得到相應的剪枝後的模型。

(3)學習轉移概率

（\(p_k\) 是轉移概率，\(p_{w1}\) 是邊緣概率）

(4)訓練流程

DMCP的訓練可以分為兩個階段：訓練原模型和更新馬爾科夫模型。這兩個階段是交替進行來優化的。

階段一，訓練原模型。

在每一輪迭代過程中，利用馬爾科夫過程取樣兩個隨機結構，同時也取樣了最大與最小的結構來保證原模型的所有引數可以充分訓練。所有采樣的結構都與原模型共享訓練引數，因此所有子模型在任務資料集上的精度損失函式得到的梯度都會更新至原模型的引數上。

階段二，更新馬爾科夫模型

在訓練原模型後，通過前文中所描述的方法將馬爾科夫模型中的轉移概率和原模型結合，從而可以利用梯度下降的方式更新馬爾科夫模型的引數，其損失函式如下：

Experiments

Conclusion

The proposed method is differentiable by modeling the channel pruning as the Markov process, thus can be optimized with respect to task loss by gradient descent.

Summary

Reference

【CVPR 2020 Oral丨DMCP: 可微分的深度模型剪枝演演算法解讀】https://zhuanlan.zhihu.com/p/146721840

【Soft Filter Pruning（SFP）演演算法筆記】https://blog.csdn.net/u014380165/article/details/81107032

【DMCP】2020-CVPR-DMCP Differentiable Markov Channel Pruning for Neural Networks-論文閱讀

DMCP 2020-CVPR-DMCP Differentiable Markov Channel Pruning for Neural Networks Shaopeng Guo（sensetime 商湯）

【interview】2020.07.24 谷歌不相容的 setImmediate、如何通過路由找到路由元件的、思考 axios 分析 Promise 封裝 ajax

一、谷歌不相容的window.setImmediate() 在 MDN 上看window.setImmediate() 描述該方法可能不會被批准成為標準，目前只有最新版本的 Internet Explorer 和Node.js 0.10+實現了該方法。

【interview】2020.07.21 map、set 資料結構、cookie封裝、首屏優化方案、定時器延時器、vue路由鉤子、calc

一、css3 新增：使用calc()計算寬高（vw/vh）簡單來說就是CSS3中新增的一個函式，calculate（計算）的縮寫。

Fight【列舉】-2020百度之星3

題目連結：http://acm.hdu.edu.cn/showproblem.php?pid=6789 分析：一開始認為要選取什麼貪心的策略，但其實只要暴力列舉即可。列舉 \\(Left、Mid，Left、Right\\) 之間打了多少輪，那麼 \\(Mid、Right\\) 還要打幾

Grid Coloring【構造】-2020牛客暑期多校6

題意：構造一個塗色的方案，使得塗色後滿足要求。 https://ac.nowcoder.com/acm/contest/5671/G

Binary Vector【公式】-2020牛客暑期多校6

題意：分析：當時時根據樣猜出的公式：\\(f(n)=\\frac{\\prod_{n}^{i=1}{(2^i-1)}}{2^{\\frac{n(n+1)}{2}}}\\)

Kabaleo Lite【__int128】-2020牛客暑期多校8

題意：題目連結：https://ac.nowcoder.com/acm/contest/5673/K 分析：思路應該是挺簡單，比賽的時候一直 \\(WA\\)，最後才知道爆了 \\(long\\ long\\)，我。。。

Tetrahedron【幾何】-2020杭電多校5

題意：題目連結：http://acm.hdu.edu.cn/showproblem.php?pid=6814 分析：對該立體圖形採用不同的方法求體積，設底面面積為 \\(S\\)，有：

【積累】【樹】2020牛客多校 A National Pandemic（樹剖）

2020牛客多校 A National Pandemic 題意一棵樹，三種操作： 1，一箇中心城市 x，所有城市 y 的值+=w-dist(x,y)

Game【博弈論】-2020杭電多校7

題意在二維平面上給出 \\(n\\) 個點的座標，初始時刻，有一顆石頭在第一個點，兩個人輪流移動石頭，要求當前移動的距離要比上一次的移動距離大，並且一個點只能用一次。不能移動的人輸。問先手勝還是後手勝。

Battle for Wosneth2【概率】-2020百度之星複賽

題意題目連結：http://acm.hdu.edu.cn/showproblem.php?pid=6842 分析考慮一個二維的 \\(DP\\) 模型，每次對連續兩輪進行分析。如果兩個人都沒有打中對方，沒有意義。而且最終二者的狀態一定是 \\((r,1)\\)，然後

【題解】2020牛客NOIP賽前集訓營-普及組（第二場）

【題解】2020牛客NOIP賽前集訓營-普及組（第二場）雖然這場比賽不停出鍋，並且\\(std\\)被連續\\(hack\\)（一半的題被hack掉）,但還是有點做頭的。

學習日記【JavaScript】-2020/12/12

教程地址：https://www.bilibili.com/video/BV1ux411d75J 目前進度：77p/190p <!DOCTYPE html>

學習日記【JavaScript】-2020/12/15

教程地址：https://www.bilibili.com/video/BV1ux411d75J?p=93 目前進度：93p/190p <!DOCTYPE html>

學習日記【Vue】-2020/12/17

教程地址：https://www.bilibili.com/video/BV12J411m7MG 目前進度：17p/37p <!DOCTYPE html>

學習日記【JavaScript】-2020/12/22

教程地址：https://www.bilibili.com/video/BV1ux411d75J?p=121 目前進度：121p/190p <!DOCTYPE html>

【AutoAugment】2019-CVPR-AutoAugment: Learning Augmentation Strategies from Data-論文閱讀

AutoAugment: Learning Augmentation Strategies from Data 2019-CVPR-AutoAugment: Learning Augmentation Strategies from Data

【LeetCode每日一題】2020.7.14 120. 三角形最小路徑和

120. 三角形最小路徑和給定一個三角形，找出自頂向下的最小路徑和。每一步只能移動到下一行中相鄰的結點上。

【2020牛客多校】2020牛客暑期多校訓練營（第二場）H-Happy Triangle——動態開點線段樹+STL+區間化點

在WA了好多發之後，終於找到了我不小心寫錯的bug……我是SB 我的寫法與網路上很多人的差異較大，但是個人覺得比其他人的更容易理解

【2020牛客多校】2020牛客暑期多校訓練營（第二場）I-Interval——最大流轉對偶圖求最短路

題目連結題意給出一個區間 \\([l ,r]\\) ，允許進行如下操作：將 \\([l, r]\\) 轉為 \\([l - 1, r]\\) 或者 \\([l + 1, r]\\)

【DMCP】2020-CVPR-DMCP Differentiable Markov Channel Pruning for Neural Networks-論文閱讀

DMCP

Introduction

Motivation

Contribution

Method

(1)優化剪枝空間

(2)建模剪枝過程

(3)學習轉移概率

(4)訓練流程

Experiments

Conclusion

Summary

Reference

相關推薦