動手創建 SSD 目標檢測框架

阿新 • • 發佈：2019-03-19

卷積神經網絡不變 cto not 深度參考 flat args 進行

參考：單發多框檢測（SSD）

本文代碼被我放置在 Github：https://github.com/XinetAI/CVX/blob/master/app/detection/ssd.py

關於 SSD 的訓練見：https://github.com/XinetAI/CVX/blob/master/目標檢測/訓練SSD.ipynb

雖然李沐大神的教程關於 SSD 的講解很不錯，但是大都是函數式的編程，本文我將 SSD 的幾個基本組件進行封裝，使得 SSD 可以像堆積木一樣來進行組織。基網絡你可以換成你想要的任意卷積模塊，而其余的組件你可以將其當作像 nn.Dense 這樣的對象來使用！

先載入一些必備包：

%matplotlib inline
import d2lzh as d2l
from mxnet import autograd, contrib, gluon, image, init, nd
from mxnet.gluon import loss as gloss, nn
import time

基礎組件

單發多框檢測（single shot multibox detection，SSD）¹ 主要由一個基礎網絡塊和若幹個多尺度特征塊串聯而成。其中基礎網絡塊用來從原始圖像中抽取特征，因此一般會選擇常用的深度卷積神經網絡。大體上可以將 SSD 分為：基網絡層，類別預測層，邊界框預測層以及高和寬減半塊四個個不同的類別。

類別預測層與邊界框預測層的設計思路與使用全局平均池化替換全連接層的想法類似。

類別預測層

下面我們來完成類別預測層：

class ClassPredictor(nn.Block):
    def __init__(self, num_anchors, num_classes, **kwargs):
        super().__init__(**kwargs)
        self.num_classes = num_classes  # 類別數目
        self.num_anchors = num_anchors  # 邊界框數目
        # 類別預測層
        self.cls_predictor = nn.Conv2D(
            self.num_anchors * (self.num_classes + 1), kernel_size=3, padding=1)

    def forward(self, Y):
        cls_preds = self.cls_predictor(Y)
        return cls_preds

ClassPredictor 生成了與某一個尺度的特征圖相同尺度的 num_classes + 1 個類別特征圖(包括背景)。

下面我們模擬兩個不同尺度的特征圖來測試效果：

Y = nd.zeros((2, 8, 20, 20))  # 批量的 20 x 20 的 8 個特征圖
cls = ClassPredictor(5, 10)  # 實例化
cls.initialize()           # 初始化
cls_preds = cls(Y)       # 類特征圖

Y1 = nd.zeros((2, 16, 10, 10))  # 批量的 10 x 10 的 16 個特征圖
cls1 = ClassPredictor(5, 10)  # 實例化
cls1.initialize()           # 初始化
cls_preds1 = cls1(Y1)       # 類特征圖2

cls_preds.shape, cls_preds1.shape

((2, 55, 20, 20), (2, 55, 10, 10))

我們需要將它們變形成統一的格式並將多尺度的預測連結，從而讓後續計算更簡單。

def flatten_pred(pred):  # 轉換為通道維度在後
    return pred.transpose((0, 2, 3, 1)).flatten()

def concat_preds(preds): # 拼接不同尺度的類別預測
    return nd.concat(*[flatten_pred(p) for p in preds], dim=1)

concat_preds([cls_preds, cls_preds1]).shape  # 拼接多個尺度特征圖的類特征圖

(2, 27500)

邊界框預測層

同樣使用一個類完成：

class BBoxPredictor(nn.Block):
    def __init__(self, num_anchors, **kwargs):
        super().__init__(**kwargs)
        self.num_anchors = num_anchors
        # 邊界框預測層
        self.bbox_predictor = nn.Conv2D(
            self.num_anchors * 4, kernel_size=3, padding=1)

    def forward(self, Y):
        bbox_preds = self.bbox_predictor(Y)
        return bbox_preds

測試效果：

Y = nd.zeros((2, 8, 20, 20))  # 批量的 20 x 20 的 8 個特征圖
bbox = BBoxPredictor(10)  # 實例化
bbox.initialize()           # 初始化
bbox_preds = bbox(Y)       # 邊界框特征圖
bbox_preds.shape

(2, 40, 20, 20)

ClassPredictor 生成了與某一個尺度的特征圖相同尺度的 num_anchors x 4 個邊界框坐標的特征圖。

高和寬減半塊

class DownSampleBlock(nn.Block):
    def __init__(self, num_channels, **kwargs):
        '''
        高和寬減半塊
        '''
        super().__init__(**kwargs)
        self.block = nn.Sequential()
        with self.block.name_scope():
            for _ in range(2):
                self.block.add(nn.Conv2D(num_channels, kernel_size=3, padding=1),
                               nn.BatchNorm(in_channels=num_channels),
                               nn.Activation('relu'))
            self.block.add(nn.MaxPool2D(2))

    def forward(self, X):
        return self.block(X)

測試效果：

Y = nd.zeros((2, 8, 20, 20))  # 批量的 20 x 20 的 8 個特征圖
down_sample = DownSampleBlock(10)
down_sample.initialize()
down_sample(Y).shape

(2, 10, 10, 10)

基網絡

為了簡潔這裏僅僅設計一個簡單的基網絡：

class BaseNet(nn.Block):
    def __init__(self, **kwargs):
        '''
        基網絡
        '''
        super().__init__(**kwargs)
        self.block = nn.Sequential()
        with self.block.name_scope():
            for num_filters in [16, 32, 64]:
                self.block.add(DownSampleBlock(num_filters))

    def forward(self, X):
        return self.block(X)

測試效果：

Y = nd.zeros((2, 8, 512, 512))  # 批量的 20 x 20 的 8 個特征圖
base_net = BaseNet()
base_net.initialize()
base_net(Y).shape

(2, 64, 64, 64)

錨框生成

class AnchorY(nn.Block):
    def __init__(self, block, size, ratio, **kwargs):
        super().__init__(**kwargs)
        self.block = block
        self._size = size
        self._ratio = ratio

    def forward(self, X):
        Y = self.block(X)
        anchors = contrib.ndarray.MultiBoxPrior(
            Y, sizes=self._size, ratios=self._ratio)
        return Y, anchors

測試效果：

block = BaseNet()
anchor_gen = AnchorY(block, .4, .7)
anchor_gen.initialize()
X = nd.zeros((2, 8, 256, 256))
Y, anchors = anchor_gen(X)
Y.shape, anchors.shape

((2, 64, 32, 32), (1, 1024, 4))

SSD 組裝

class TinySSD(nn.Block):
    def __init__(self, sizes, ratios, num_classes, **kwargs):
        super().__init__(**kwargs)
        sizes, ratios, self.num_classes = sizes, ratios, num_classes
        self.num_anchors = len(sizes[0]) + len(ratios[0]) - 1
        for i in range(5):
            # 即賦值語句self.blk_i = get_blk(i)
            setattr(self, 'blk_%d' % i, self.block(i))
            setattr(self, 'cls_%d' % i, ClassPredictor(self.num_anchors,
                                                       self.num_classes))
            setattr(self, 'bbox_%d' % i, BBoxPredictor(self.num_anchors))
            setattr(self, 'anchor_%d' % i, AnchorY(
                getattr(self, 'blk_%d' % i), sizes[i], ratios[i]))

    def block(self, i):
        if i == 0:
            blk = BaseNet()
        elif i == 4:
            blk = nn.GlobalMaxPool2D()
        else:
            blk = DownSampleBlock(128)
        return blk

    def forward(self, X):
        anchors, cls_preds, bbox_preds = [None] * 5, [None] * 5, [None] * 5
        for i in range(5):
            # getattr(self, 'blk_%d' % i)即訪問self.blk_i
            Y, anchors[i] = getattr(self, 'anchor_%d' % i)(X)
            cls_preds[i] = getattr(self, 'cls_%d' % i)(Y)
            bbox_preds[i] = getattr(self, 'bbox_%d' % i)(Y)
            X = Y
        # reshape函數中的0表示保持批量大小不變
        cls_preds = concat_preds(cls_preds).reshape(
            (0, -1, self.num_classes + 1))
        return nd.concat(*anchors, dim=1), cls_preds, concat_preds(bbox_preds)

測試代碼：

sizes = [[0.2, 0.272], [0.37, 0.447], [0.54, 0.619], [0.71, 0.79],
         [0.88, 0.961]]
ratios = [[1, 2, 0.5]] * 5
num_classes = 1

X = nd.zeros((32, 3, 256, 256))
net = TinySSD(sizes, ratios, num_classes)
net.initialize()
anchors, cls_preds, bbox_preds = net(X)

print('output anchors:', anchors.shape)
print('output class preds:', cls_preds.shape)
print('output bbox preds:', bbox_preds.shape)

output anchors: (1, 5444, 4)
output class preds: (32, 5444, 2)
output bbox preds: (32, 21776)

網絡結構：

net

TinySSD(
  (blk_0): BaseNet(
    (block): Sequential(
      (0): DownSampleBlock(
        (block): Sequential(
          (0): Conv2D(3 -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=16)
          (2): Activation(relu)
          (3): Conv2D(16 -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=16)
          (5): Activation(relu)
          (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
        )
      )
      (1): DownSampleBlock(
        (block): Sequential(
          (0): Conv2D(16 -> 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=32)
          (2): Activation(relu)
          (3): Conv2D(32 -> 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=32)
          (5): Activation(relu)
          (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
        )
      )
      (2): DownSampleBlock(
        (block): Sequential(
          (0): Conv2D(32 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
          (2): Activation(relu)
          (3): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
          (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
          (5): Activation(relu)
          (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
        )
      )
    )
  )
  (cls_0): ClassPredictor(
    (cls_predictor): Conv2D(64 -> 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (bbox_0): BBoxPredictor(
    (bbox_predictor): Conv2D(64 -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (anchor_0): AnchorY(
    (block): BaseNet(
      (block): Sequential(
        (0): DownSampleBlock(
          (block): Sequential(
            (0): Conv2D(3 -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=16)
            (2): Activation(relu)
            (3): Conv2D(16 -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=16)
            (5): Activation(relu)
            (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
          )
        )
        (1): DownSampleBlock(
          (block): Sequential(
            (0): Conv2D(16 -> 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=32)
            (2): Activation(relu)
            (3): Conv2D(32 -> 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=32)
            (5): Activation(relu)
            (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
          )
        )
        (2): DownSampleBlock(
          (block): Sequential(
            (0): Conv2D(32 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
            (2): Activation(relu)
            (3): Conv2D(64 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=64)
            (5): Activation(relu)
            (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
          )
        )
      )
    )
  )
  (blk_1): DownSampleBlock(
    (block): Sequential(
      (0): Conv2D(64 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
      (2): Activation(relu)
      (3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
      (5): Activation(relu)
      (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    )
  )
  (cls_1): ClassPredictor(
    (cls_predictor): Conv2D(128 -> 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (bbox_1): BBoxPredictor(
    (bbox_predictor): Conv2D(128 -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (anchor_1): AnchorY(
    (block): DownSampleBlock(
      (block): Sequential(
        (0): Conv2D(64 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
        (2): Activation(relu)
        (3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
        (5): Activation(relu)
        (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
      )
    )
  )
  (blk_2): DownSampleBlock(
    (block): Sequential(
      (0): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
      (2): Activation(relu)
      (3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
      (5): Activation(relu)
      (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    )
  )
  (cls_2): ClassPredictor(
    (cls_predictor): Conv2D(128 -> 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (bbox_2): BBoxPredictor(
    (bbox_predictor): Conv2D(128 -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (anchor_2): AnchorY(
    (block): DownSampleBlock(
      (block): Sequential(
        (0): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
        (2): Activation(relu)
        (3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
        (5): Activation(relu)
        (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
      )
    )
  )
  (blk_3): DownSampleBlock(
    (block): Sequential(
      (0): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
      (2): Activation(relu)
      (3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
      (5): Activation(relu)
      (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
    )
  )
  (cls_3): ClassPredictor(
    (cls_predictor): Conv2D(128 -> 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (bbox_3): BBoxPredictor(
    (bbox_predictor): Conv2D(128 -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (anchor_3): AnchorY(
    (block): DownSampleBlock(
      (block): Sequential(
        (0): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
        (2): Activation(relu)
        (3): Conv2D(128 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=128)
        (5): Activation(relu)
        (6): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False)
      )
    )
  )
  (blk_4): GlobalMaxPool2D(size=(1, 1), stride=(1, 1), padding=(0, 0), ceil_mode=True)
  (cls_4): ClassPredictor(
    (cls_predictor): Conv2D(128 -> 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (bbox_4): BBoxPredictor(
    (bbox_predictor): Conv2D(128 -> 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
  (anchor_4): AnchorY(
    (block): GlobalMaxPool2D(size=(1, 1), stride=(1, 1), padding=(0, 0), ceil_mode=True)
  )
)

下面你便可以使用該網絡進行目標檢測了。

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016, October). Ssd: Single shot multibox detector. In European conference on computer vision (pp. 21-37). Springer, Cham.?

動手創建 SSD 目標檢測框架

卷積神經網絡不變 cto not 深度參考 flat args 進行參考：單發多框檢測（SSD）本文代碼被我放置在 Github：https://github.com/XinetAI/CVX/blob/master/app/detection/ssd.py 關於 S

動手創建 SSD 目標檢測框架

基礎組件

類別預測層

邊界框預測層

高和寬減半塊

基網絡

錨框生成

SSD 組裝

動手創建 SSD 目標檢測框架

目標檢測框架py-faster-rcnn修改anchor_box

vs2017創建支持多框架（net4.6.1;net4.6.2;netstandard2.0;netcoreapp2.0）版本

SSD 目標檢測演算法詳細總結分析（one-stage)(深度學習)(ECCV 2016)

【AI實戰】動手訓練自己的目標檢測模型（YOLO篇）

SSD目標檢測(1)：圖片+視訊內的物體定位（附原始碼）

使用SSD目標檢測c++介面編譯問題解決記錄

SSD目標檢測演算法改進DSSD（反捲積）

解讀SSD目標檢測方法

SSD目標檢測原理

經典網路結構梳理：SSD目標檢測演算法。

SSD目標檢測(2)：如何製作自己的資料集（詳細說明附原始碼）

SSD目標檢測論文簡讀

SSD目標檢測(3)：使用自己的資料集做預測（詳細說明附原始碼）

基於深度學習的目標檢測框架總結

為什麼SSD目標檢測演算法對小目標檢測的效果不好

ssd目標檢測整理

製作SSD目標檢測模型需要的訓練資料並訓練SSD目標檢測模型

TF專案實戰（基於SSD目標檢測）——人臉檢測1

CVPR2020|3D-VID:基於LiDar Video資訊的3D目標檢測框架

動手創建 SSD 目標檢測框架

基礎組件

類別預測層

邊界框預測層

高和寬減半塊

基網絡

錨框生成

SSD 組裝

相關推薦