《MobileNetV2: Inverted Residuals and Linear Bottlenecks》論文閱讀

阿新 • • 發佈：2020-08-07

《MobileNetV2: Inverted Residuals and Linear Bottlenecks》論文閱讀

一、引言

在MobileNetV1（深度可分離卷積）的基礎上，延續了它的簡單性，不新增任何特殊的操作，提高準確性。
Inverted Resduals 和 Linear Bottlenecks
- Inverted Resduals：加入了ResNet的shotcut結構
- Linear Bottlenecks：將bottleneck中的最後的通道數較少的feature map後面跟的relu6啟用函式去掉，即去掉非線性。
問題：輸入流形可以嵌入到啟用空間的低維子空間中，那麼ReLU變換在將所需的複雜性引入可表達函式集的同時保留了資訊（if the input manifold can be embedded into a signiﬁcantly lower-dimensional subspace of the activation space then the ReLU transformation preserves the information while introducing the needed complexity into the set of expressible functions.）//沒讀懂
MobileNetV2基於反轉殘差結構，其中的跳躍連線位於較瘦的瓶頸層之間。中間的擴充套件層利用輕量級的深度卷積來提取特徵引入非線性，為了維持網路的表示能力，去除了較窄層的非線性啟用函式。

二、詳細內容

2.1 MobileNetV2架構（藍色模組代表深度可分離卷積層）

2.2 線性瓶頸（Linear Bottlenecks）結構

MobileNetV2相比於MobileNetV1引入了線性瓶頸的概念。因為Depthwise Convolution沒有改變通道數的能力，如果輸入通道數很少的話，Depthwise Convolution只能在低維空間提取特徵，得不到很好的效果。因此，V2在Depthwise convolution前面加入了Pointwise Convolution升維。

作者去掉了第二個Pointwise Convolution的啟用函式ReLU，作者認為啟用函式在高維空間能夠有效增加非線性，在低維空間會破壞特徵。

下圖為ReLU會對channel數較低的manifolds造成較大的資訊損耗，當輸出維度增加到15以後，ReLU才基本不會丟失太多資訊。

2.3 反轉殘差

藍色方塊的厚度代表通道數的大小
圖（a）殘差結構
1. 用Pointwise Convolution來降低通道數
2. 用3×3卷積核進行卷積操作
3. 再用Pointwise Convolution將通道數恢復到原來大小
4. 跳躍連線建立在兩個通道數比較多的層之間
5. 每一層都用ReLU函式啟用
圖（b）反轉殘差結構
1. 用Pointwise Convolution提高通道數
2. 進行深度卷積操作
3. 再用Pointwise Convolution將通道數降低到原始大小
4. 跳躍連線建立在兩個通道數比較少的瓶頸層之間
5. 陰影的兩個塊沒有ReLU啟用函式

2.4 網路結構

第一層為標準的卷積操作
後面是瓶頸結構
t：擴充套件因子；c：輸出通道數；n：重複次數；s：步長；（如果步長為2，代表當前重複結構的第一個步塊長為2，其餘的步長為1，步長為2時沒有跳躍連線，如下圖所示）

三、程式碼練習

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim

class Block(nn.Module):
    '''expand + depthwise + pointwise'''
    def __init__(self, in_planes, out_planes, expansion, stride):
        super(Block, self).__init__()
        self.stride = stride
        # 通過 expansion 增大 feature map 的數量
        planes = expansion * in_planes
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, groups=planes, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.conv3 = nn.Conv2d(planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn3 = nn.BatchNorm2d(out_planes)

        # 步長為 1 時，如果 in 和 out 的 feature map 通道不同，用一個卷積改變通道數
        if stride == 1 and in_planes != out_planes:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False),
                nn.BatchNorm2d(out_planes))
        # 步長為 1 時，如果 in 和 out 的 feature map 通道相同，直接返回輸入
        if stride == 1 and in_planes == out_planes:
            self.shortcut = nn.Sequential()

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        # 步長為1，加 shortcut 操作
        if self.stride == 1:
            return out + self.shortcut(x)
        # 步長為2，直接輸出
        else:
            return out

建立MobileNetV2網路

class MobileNetV2(nn.Module):
    # (expansion, out_planes, num_blocks, stride)
    cfg = [(1,  16, 1, 1),
           (6,  24, 2, 1), 
           (6,  32, 3, 2),
           (6,  64, 4, 2),
           (6,  96, 3, 1),
           (6, 160, 3, 2),
           (6, 320, 1, 1)]

    def __init__(self, num_classes=10):
        super(MobileNetV2, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(32)
        self.layers = self._make_layers(in_planes=32)
        self.conv2 = nn.Conv2d(320, 1280, kernel_size=1, stride=1, padding=0, bias=False)
        self.bn2 = nn.BatchNorm2d(1280)
        self.linear = nn.Linear(1280, num_classes)

    def _make_layers(self, in_planes):
        layers = []
        for expansion, out_planes, num_blocks, stride in self.cfg:
            strides = [stride] + [1]*(num_blocks-1)
            for stride in strides:
                layers.append(Block(in_planes, out_planes, expansion, stride))
                in_planes = out_planes
        return nn.Sequential(*layers)

    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layers(out)
        out = F.relu(self.bn2(self.conv2(out)))
        out = F.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out

建立DataLoader

# 使用GPU訓練，可以在選單 "程式碼執行工具" -> "更改執行時型別" 裡進行設定
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,  download=True, transform=transform_train)
testset  = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)

trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=False, num_workers=2)

例項化網路

# 網路放到GPU上
net = MobileNetV2().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)

模型訓練

for epoch in range(10):  # 重複多輪訓練
    for i, (inputs, labels) in enumerate(trainloader):
        inputs = inputs.to(device)
        labels = labels.to(device)
        # 優化器梯度歸零
        optimizer.zero_grad()
        # 正向傳播 +　反向傳播 + 優化 
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        # 輸出統計資訊
        if i % 100 == 0:   
            print('Epoch: %d Minibatch: %5d loss: %.3f' %(epoch + 1, i + 1, loss.item()))

print('Finished Training')

模型測試

correct = 0
total = 0

for data in testloader:
    images, labels = data
    images, labels = images.to(device), labels.to(device)
    outputs = net(images)
    _, predicted = torch.max(outputs.data, 1)
    total += labels.size(0)
    correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %.2f %%' % (
    100 * correct / total))

《MobileNetV2: Inverted Residuals and Linear Bottlenecks》論文閱讀

《MobileNetV2: Inverted Residuals and Linear Bottlenecks》論文閱讀一、引言在MobileNetV1（深度可分離卷積）的基礎上，延續了它的簡單性，不新增任何特殊的操作，提高準確性。

EAST: An Efﬁcient and Accurate Scene Text Detector 論文閱讀

EAST: An Efﬁcient and Accurate Scene Text Detector 論文閱讀 Reference正文摘要引言相關工作方法演算法網路設計標籤生成損失函式訓練位置感知的NMS

論文閱讀筆記《Deep Active Learning for Civil Infrastructure Defect Detection and Classification》

小樣本學習&元學習經典論文整理||持續更新核心思想本文提出一種基於主動學習的民用設施缺陷檢測方法，其思路主要是考慮到在樣本較少的情況下，訓練得到的網路可能不能很好的對各種型別的缺陷都進

論文閱讀：The Role of “Condition”: A Novel Scientific Knowledge Graph Representation and Construction Model

“條件”的作用:一種新的科學知識圖表示與構建模型 Abstract 　　條件關係在科學觀測、假設和陳述中起著重要作用，但是現有的科學知識圖譜（SicKgs）與一般領域的知識圖譜（KGs）一樣，沒有考慮事實有效的條件，僅

Learning local feature descriptors with triplets and shallow convolutional neural networks 論文閱讀筆記

Learning local feature descriptors with triplets and shallow convolutional neural networks 題目翻譯：學習 local feature descriptors使用 triplets 還有淺的卷積神經網路。讀罷此文，只覺收穫滿滿，同時另

論文閱讀：3D human pose estimation in video with temporal convolutions and semi-supervised training

2019 CVPR的文章，使用時序卷積和半監督訓練的3D人體姿態估計論文連結：https://arxiv.org/abs/1811.11742

【ICCV 2021】論文閱讀：3D Human Pose Estimation with Spatial and Temporal Transformers

ICCV2021 的一篇文章，一開始的感覺是在之前CVPR 2019上的VideoPose3D基礎上做的一些工作，主要是把最近兩年很火的vision Transformer加到了上面。

【論文閱讀】Affective database for e-learning and classroom environments using Indian students’ faces, hand gestures and body postures】

1.這篇文章究竟講了什麼問題？幾乎沒有一個標準的資料集，包含學生情感狀態識別以及分析，在線上課堂和教室環境。

《MobileNetV2: Inverted Residuals and Linear Bottlenecks》論文閱讀