《MobileNetV2: Inverted Residuals and Linear Bottlenecks》論文閱讀
阿新 • • 發佈:2020-08-07
《MobileNetV2: Inverted Residuals and Linear Bottlenecks》論文閱讀
一、引言
- 在MobileNetV1(深度可分離卷積)的基礎上,延續了它的簡單性,不新增任何特殊的操作,提高準確性。
- Inverted Resduals 和 Linear Bottlenecks
- Inverted Resduals:加入了ResNet的shotcut結構
- Linear Bottlenecks:將bottleneck中的最後的通道數較少的feature map後面跟的relu6啟用函式去掉,即去掉非線性。
- 問題:輸入流形可以嵌入到啟用空間的低維子空間中,那麼ReLU變換在將所需的複雜性引入可表達函式集的同時保留了資訊(if the input manifold can be embedded into a significantly lower-dimensional subspace of the activation space then the ReLU transformation preserves the information while introducing the needed complexity into the set of expressible functions.)//沒讀懂
- MobileNetV2基於反轉殘差結構,其中的跳躍連線位於較瘦的瓶頸層之間。中間的擴充套件層利用輕量級的深度卷積來提取特徵引入非線性,為了維持網路的表示能力,去除了較窄層的非線性啟用函式。
二、詳細內容
2.1 MobileNetV2架構(藍色模組代表深度可分離卷積層)
2.2 線性瓶頸(Linear Bottlenecks)結構
MobileNetV2相比於MobileNetV1引入了線性瓶頸的概念。因為Depthwise Convolution沒有改變通道數的能力,如果輸入通道數很少的話,Depthwise Convolution只能在低維空間提取特徵,得不到很好的效果。因此,V2在Depthwise convolution前面加入了Pointwise Convolution升維。
作者去掉了第二個Pointwise Convolution的啟用函式ReLU,作者認為啟用函式在高維空間能夠有效增加非線性,在低維空間會破壞特徵。
下圖為ReLU會對channel數較低的manifolds造成較大的資訊損耗,當輸出維度增加到15以後,ReLU才基本不會丟失太多資訊。
2.3 反轉殘差
- 藍色方塊的厚度代表通道數的大小
- 圖(a)殘差結構
- 用Pointwise Convolution來降低通道數
- 用3×3卷積核進行卷積操作
- 再用Pointwise Convolution將通道數恢復到原來大小
- 跳躍連線建立在兩個通道數比較多的層之間
- 每一層都用ReLU函式啟用
- 圖(b)反轉殘差結構
- 用Pointwise Convolution提高通道數
- 進行深度卷積操作
- 再用Pointwise Convolution將通道數降低到原始大小
- 跳躍連線建立在兩個通道數比較少的瓶頸層之間
- 陰影的兩個塊沒有ReLU啟用函式
2.4 網路結構
- 第一層為標準的卷積操作
- 後面是瓶頸結構
- t:擴充套件因子;c:輸出通道數;n:重複次數;s:步長;(如果步長為2,代表當前重複結構的第一個步塊長為2,其餘的步長為1,步長為2時沒有跳躍連線,如下圖所示)
三、程式碼練習
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim
class Block(nn.Module):
'''expand + depthwise + pointwise'''
def __init__(self, in_planes, out_planes, expansion, stride):
super(Block, self).__init__()
self.stride = stride
# 通過 expansion 增大 feature map 的數量
planes = expansion * in_planes
self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, stride=1, padding=0, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, groups=planes, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False)
self.bn3 = nn.BatchNorm2d(out_planes)
# 步長為 1 時,如果 in 和 out 的 feature map 通道不同,用一個卷積改變通道數
if stride == 1 and in_planes != out_planes:
self.shortcut = nn.Sequential(
nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(out_planes))
# 步長為 1 時,如果 in 和 out 的 feature map 通道相同,直接返回輸入
if stride == 1 and in_planes == out_planes:
self.shortcut = nn.Sequential()
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = F.relu(self.bn2(self.conv2(out)))
out = self.bn3(self.conv3(out))
# 步長為1,加 shortcut 操作
if self.stride == 1:
return out + self.shortcut(x)
# 步長為2,直接輸出
else:
return out
建立MobileNetV2網路
class MobileNetV2(nn.Module):
# (expansion, out_planes, num_blocks, stride)
cfg = [(1, 16, 1, 1),
(6, 24, 2, 1),
(6, 32, 3, 2),
(6, 64, 4, 2),
(6, 96, 3, 1),
(6, 160, 3, 2),
(6, 320, 1, 1)]
def __init__(self, num_classes=10):
super(MobileNetV2, self).__init__()
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(32)
self.layers = self._make_layers(in_planes=32)
self.conv2 = nn.Conv2d(320, 1280, kernel_size=1, stride=1, padding=0, bias=False)
self.bn2 = nn.BatchNorm2d(1280)
self.linear = nn.Linear(1280, num_classes)
def _make_layers(self, in_planes):
layers = []
for expansion, out_planes, num_blocks, stride in self.cfg:
strides = [stride] + [1]*(num_blocks-1)
for stride in strides:
layers.append(Block(in_planes, out_planes, expansion, stride))
in_planes = out_planes
return nn.Sequential(*layers)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.layers(out)
out = F.relu(self.bn2(self.conv2(out)))
out = F.avg_pool2d(out, 4)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out
建立DataLoader
# 使用GPU訓練,可以在選單 "程式碼執行工具" -> "更改執行時型別" 裡進行設定
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
transform_train = transforms.Compose([
transforms.RandomCrop(32, padding=4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])
transform_test = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)
testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=False, num_workers=2)
例項化網路
# 網路放到GPU上
net = MobileNetV2().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
模型訓練
for epoch in range(10): # 重複多輪訓練
for i, (inputs, labels) in enumerate(trainloader):
inputs = inputs.to(device)
labels = labels.to(device)
# 優化器梯度歸零
optimizer.zero_grad()
# 正向傳播 + 反向傳播 + 優化
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# 輸出統計資訊
if i % 100 == 0:
print('Epoch: %d Minibatch: %5d loss: %.3f' %(epoch + 1, i + 1, loss.item()))
print('Finished Training')
模型測試
correct = 0
total = 0
for data in testloader:
images, labels = data
images, labels = images.to(device), labels.to(device)
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %.2f %%' % (
100 * correct / total))