Faster RCNN 學習與實現

阿新 • • 發佈：2019-03-31

適配 span min() chan resize 我們 https split other

論文
論文翻譯

Faster R-CNN 主要分為兩個部分：

RPN（Region Proposal Network）生成高質量的 region proposal；
Fast R-CNN 利用 region proposal 做出檢測。

在論文中作者將 RPN 比作神經網絡的註意力機制（"attention" mechanisms），告訴網絡看哪裏。為了更好的理解，下面簡要的敘述論文的關鍵內容。

RPN

Input：任意尺寸的圖像
Output：一組帶有目標得分的目標矩形 proposals

為了生成 region proposals，在基網絡的最後一個卷積層 x

上滑動一個小網絡。該小網絡由一個 \(3\times 3\) 卷積 conv1 和一對兄弟卷積（並行的）\(1\times 1\) 卷積 reg 和 cls 組成。其中，conv1 的參數 padding=1，stride=1 以保證其不會改變輸出的特征圖的尺寸。reg 作為 box-regression 用來編碼 box 的坐標，cls 作為 box-classifaction 用來編碼每個 proposal 是目標的概率。詳細內容見我的博客：我的目標檢測筆記。論文中把不同 scale 和 aspect ratio 的 \(k\) 個 reference boxes（參數化的 proposal）稱作 anchors

（錨點）。錨點是滑塊的中心。

為了更好的理解 anchors，下面以 Python 來展示其內涵。

錨點

首先利用COCO 數據集的使用中介紹的 API 來獲取一張 COCO 數據集的圖片及其標註。

先載入一些必備的包：

import cv2
from matplotlib import pyplot as plt
import numpy as np

# 載入 coco 相關 api
import sys
sys.path.append(r'D:\API\cocoapi\PythonAPI')
from pycocotools.dataset import Loader
%matplotlib inline

利用 Loader 載入 val2017 數據集，並選擇包含 ‘cat‘, ‘dog‘, ‘person‘ 的圖片：

dataType = 'val2017'
root = 'E:/Data/coco'
catNms = ['cat', 'dog', 'person']
annType = 'annotations_trainval2017'
loader = Loader(dataType, catNms, root, annType)

輸出結果：

Loading json in memory ...
used time: 0.762376 s
Loading json in memory ...
creating index...
index created!
used time: 0.401951 s

可以看出，Loader 載入數據的速度很快。為了更加詳細的查看 loader，下面打印出現一些相關信息：

print(f'總共包含圖片 {len(loader)} 張')
for i, ann in enumerate(loader.images):
    w, h = ann['height'], ann['width']
    print(f'第 {i+1} 張圖片的高和寬分別為: {w, h}')

顯示：

總共包含圖片 2 張
第 1 張圖片的高和寬分別為: (612, 612)
第 2 張圖片的高和寬分別為: (500, 333)

下面以第 1 張圖片為例來探討 anchors。先可視化：

img, labels = loader[0]
plt.imshow(img);

輸出：

技術分享圖片

為了讓特征圖的尺寸大一點，可以將其 resize 為 (800, 800, 3)：

img = cv2.resize(img, (800, 800))
print(img.shape)

輸出：

(800, 800, 3)

下面借助 MXNet 來完成接下來的代碼編程，為了適配 MXNet 需要將圖片由 (h, w, 3) 轉換為 (3, w, h) 形式。

img = img.transpose(2, 1, 0)
print(img.shape)

輸出：

(3, 800, 800)

由於卷積神經網絡的輸入是四維數據，故而，還需要：

img = np.expand_dims(img, 0)
print(img.shape)

輸出

(1, 3, 800, 800)

為了和論文一致，我們也采用 VGG16 網絡（載入 gluoncv中的權重）：

from gluoncv.model_zoo import vgg16
net = vgg16(pretrained=True)  #  載入權重

僅僅考慮直至最後一層卷積層(去除池化層)的網絡，下面查看網絡的各個卷積層的輸出情況：

from mxnet import nd
imgs = nd.array(img)  # 轉換為 mxnet 的數據類型
x = imgs
for layer in net.features[:29]:
    x = layer(x)
    if "conv" in layer.name:
        print(layer.name, x.shape) # 輸出該卷積層的 shape

結果為：

vgg0_conv0 (1, 64, 800, 800)
vgg0_conv1 (1, 64, 800, 800)
vgg0_conv2 (1, 128, 400, 400)
vgg0_conv3 (1, 128, 400, 400)
vgg0_conv4 (1, 256, 200, 200)
vgg0_conv5 (1, 256, 200, 200)
vgg0_conv6 (1, 256, 200, 200)
vgg0_conv7 (1, 512, 100, 100)
vgg0_conv8 (1, 512, 100, 100)
vgg0_conv9 (1, 512, 100, 100)
vgg0_conv10 (1, 512, 50, 50)
vgg0_conv11 (1, 512, 50, 50)
vgg0_conv12 (1, 512, 50, 50)

由此，可以看出尺寸為 (800, 800) 的原圖變為了 (50, 50) 的特征圖（比原來縮小了 16 倍）。

感受野

上面的 16 不僅僅是針對尺寸為 (800, 800)，它適用於任意尺寸的圖片，因為 16 是特征圖的一個像素點的感受野（receptive ?eld ）。

感受野的大小是如何計算的？我們回憶卷積運算的過程，便可發現感受野的計算恰恰是卷積計算的逆過程（參考感受野計算¹）。

記 \(F_k, S_k, P_k\) 分別表示第 \(k\) 層的卷積核的高(或者寬)、移動步長（stride）、Padding 個數；記 \(i_k\) 表示第 \(k\) 層的輸出特征圖的高（或者寬）。這樣，很容易得出如下遞推公式：

\[ i_{k+1} = \lfloor \frac{i_{k}-F_{k}+2P_{k}}{s_{k}}\rfloor + 1 \]

其中 \(k \in \{1, 2, \cdots\}\)，且 \(i_0\) 表示原圖的高或者寬。令 \(t_k = \frac{F_k - 1}{2} - P_k\)，上式可以轉換為

\[ (i_{k-1} - 1) = (i_{k} - 1) S_k + 2t_k \]

反推感受野, 令 \(i_1 = F_1\), 且\(t_k = \frac{F_k -1}{2} - P_k\), 且 \(1\leq j \leq L\), 則有

\[ i_0 = (i_L - 1)\alpha_L + \beta_L \]

其中 \(\alpha_L = \prod_{p=1}^{L}S_p\)，且有：

\[ \beta_L = 1 + 2\sum_{p=1}^L (\prod_{q=1}^{p-1}S_q) t_p \]

由於 VGG16 的卷積核的配置均是 kernel_size=(3, 3), padding=(1, 1)，同時只有在經過池化層才使得 \(S_j = 2\)，故而 \(\beta_j = 0\)，且有 \(\alpha_L = 2^4 = 16\)。

錨點的計算

在編程實現的時候，將感受野的大小使用 base_size 來表示。下面我們討論如何生成錨框？為了計算的方便，先定義一個 Box：

import numpy as np


class Box:
    '''
    corner: (xmin,ymin,xmax,ymax)
    '''

    def __init__(self, corner):
        self._corner = corner

    @property
    def corner(self):
        return self._corner

    @corner.setter
    def corner(self, new_corner):
        self._corner = new_corner

    @property
    def w(self):
        '''
        計算 bbox 的 寬
        '''
        return self.corner[2] - self.corner[0] + 1

    @property
    def h(self):
        '''
        計算 bbox 的 高
        '''
        return self.corner[3] - self.corner[1] + 1

    @property
    def area(self):
        '''
        計算 bbox 的 面積
        '''
        return self.w * self.h

    @property
    def whctrs(self):
        '''
        計算 bbox 的 中心坐標
        '''
        xctr = self.corner[0] + (self.w - 1) * .5
        yctr = self.corner[1] + (self.h - 1) * .5
        return xctr, yctr

    def __and__(self, other):
        '''
        運算符：&，實現兩個 box 的交集運算
        '''
        U = np.array([self.corner, other.corner])
        xmin, ymin, xmax, ymax = np.split(U, 4, axis=1)
        w = xmax.min() - xmin.max()
        h = ymax.min() - ymin.max()
        return w * h

    def __or__(self, other):
        '''
        運算符：|，實現兩個 box 的並集運算
        '''
        I = self & other
        return self.area + other.area - I

    def IoU(self, other):
        '''
        計算 IoU
        '''
        I = self & other
        U = self | other
        return I / U

類 Box 實現了 bbox 的交集、並集運算以及 IoU 的計算。下面舉一個例子來說明：

bbox = [0, 0, 15, 15]  # 邊界框
bbox1 = [5, 5, 12, 12] # 邊界框
A = Box(bbox)  # 一個 bbox 實例
B = Box(bbox1) # 一個 bbox 實例

下面便可以輸出 A 與 B 的高寬、中心、面積、交集、並集、Iou：

print('A 與 B 的交集', str(A & B))
print('A 與 B 的並集', str(A | B))
print('A 與 B 的 IoU', str(A.IoU(B)))
print(u'A 的中心、高、寬以及面積', str(A.whctrs), A.h, A.w, A.area)

輸出結果：

A 與 B 的交集 49
A 與 B 的並集 271
A 與 B 的 IoU 0.18081180811808117
A 的中心、高、寬以及面積 (7.5, 7.5) 16 16 256

Lenc K, Vedaldi A. R-CNN minus R.[J]. british machine vision conference, 2015.?

Faster RCNN 學習與實現

適配 span min() chan resize 我們 https split other 論文論文翻譯 Faster R-CNN 主要分為兩個部分： RPN（Region Proposal Network）生成高質量的 region proposal； Fast

Faster RCNN 學習與實現

RPN

錨點

感受野

錨點的計算

Faster RCNN 學習與實現

RxJava的學習與實現

Faster RCNN 學習筆記

DLIB庫example中face_detection_ex學習與實現

DLIB庫example中3d_point_cloud_ex.cpp（3D點雲）學習與實現

faster rcnn學習（三）

VGG學習與實現

GoogleNet學習與實現

ResNet學習與實現

AlexNet學習與實現

py-faster-rcnn安裝與配置

高階資料結構的學習與實現之 Trie樹，字典樹

tf-faster-rcnn問題與研究總結

faster rcnn學習之rpn訓練全過程

T樹索引的學習與實現（二）

Warshall傳遞閉包演算法的學習與實現

MongoDB全量遷移斷點續傳功能學習與實現

WebSocket學習與實現

深度學習目標檢測系列：faster RCNN實現|附python原始碼

py-faster-rcnn + ZF 實現自己的資料訓練與檢測(一)

Faster RCNN 學習與實現

RPN

錨點

感受野

錨點的計算

相關推薦