RPN網路程式碼解讀

阿新 • • 發佈：2018-12-01

1. 說在前面的話

在目標檢測領域Faster RCNN可以說是無人不知無人不曉，它裡面有一個網路結構RPN（Region Proposal Network）用於在特徵圖上產生候選預測區域。但是呢，這個網路結構具體是怎麼工作的呢？網上有很多種解釋，但是都是雲裡霧裡的，還是直接擼程式碼來得直接，這裡就直接從程式碼入手直接擼吧-_-||。
首先，來看一下Faster RCNN中RPN的結構是什麼樣子的吧。可以看到RPN直接通過一個卷積層rpn_conv/3x3直接接在了分類網路的特徵層輸出上面，之後接上兩個卷積層rpn_clc_score與rpn_bbox_pred分別用於產生前景背景分類與預測框。之後再由python層AnchorTargetLayer產生anchor機制的分類與預測框。然後，經過ROI Proposal產生ROI區域的候選，並通過ROI Pooling規範到相同的尺寸上進行後續處理。大體的結構如下圖所示：
在這裡插入圖片描述

雖然在上面的圖中能夠對RPN網路有一個比較直觀但是籠統的概念，其具體內部搞了啥子，並不清楚。所以還是擼一下它裡面的程式碼看看吧，首先來看RPN模組中各個檔案說明。
（1）generate_anchors.py
在[0,0,15,15]基礎anchor的基礎上生成不同寬高比例以及縮放大小的anchor。
Generates a regular grid of multi-scale, multi-aspect anchor boxes.
（2）proposal_layer.py
將RPN網路的每個anchor的分類得分以及檢測框迴歸預估轉換為目標候選
Converts RPN outputs (per-anchor scores and bbox regression estimates) into object proposals.
（3）anchor_target_layer.py
為每個anchor生成訓練目標或標籤，分類的標籤只是0（非目標）1（是目標）-1（忽略）。當分類的標籤大於0的時候預測框的迴歸才被指定。
Generates training targets/labels for each anchor. Classification labels are 1 (object), 0 (not object) or -1 (ignore).
Bbox regression targets are specified when the classification label is > 0.
（4）proposal_target_layer.py
為每個目標候選生成訓練目標或標籤，分類標籤從

0-K

（背景0或目標類別

1, \dots, K

），自然lable值大於0的才被指定預測框迴歸。
Generates training targets/labels for each object proposal: classification labels 0 - K (bg or object class 1, … , K)
and bbox regression targets in that case that the label is > 0.
（5）generate.py
使用RPN從IMDB輸入資料上產生目標候選。
Generate object detection proposals from an imdb using an RPN.
現在對RPN網路的結構和RPN模組中檔案有了一個大體的認識，那麼接下來就開始閱讀裡面的實現程式碼，看看它究竟幹了些什麼事情。

2. RPN網路部分

這個部分使用到的檔案有anchor_target_layer.py、generate_anchors.py。這裡的generate_anchors.py是用來產生模型需要的anchor的，其中也包含了一些其它的輔助函式，它不是講解說明的重點，這裡不作介紹。主要來看anchor_target_layer.py檔案。
首先，來看看這個層的初始化函式：

def setup(self, bottom, top):
    layer_params = yaml.load(self.param_str_)
    anchor_scales = layer_params.get('scales', (8, 16, 32)) # 尺度變化引數
    self._anchors = generate_anchors(scales=np.array(anchor_scales)) # 生成預設的9個anchor
    self._num_anchors = self._anchors.shape[0]
    self._feat_stride = layer_params['feat_stride']

    # allow boxes to sit over the edge by a small amount
	# 設為0，則取出任何超過影象邊界的proposals，只要超出一點點，都要去除
    self._allowed_border = layer_params.get('allowed_border', 0)

    height, width = bottom[0].data.shape[-2:]
    if DEBUG:
        print 'AnchorTargetLayer: height', height, 'width', width

        A = self._num_anchors
    # labels 是否為目標的分類
    top[0].reshape(1, 1, A * height, width)
    # bbox_targets
    top[1].reshape(1, A * 4, height, width)
    # bbox_inside_weights
    top[2].reshape(1, A * 4, height, width)
    # bbox_outside_weights
top[3].reshape(1, A * 4, height, width)

接下來就是重頭的forward函式，首先，該函式在特徵圖生成需要運算的總的anchor

# 1. Generate proposals from bbox deltas and shifted anchors
# x方向的偏移個數，大小為特徵圖的width
shift_x = np.arange(0, width) * self._feat_stride
# y方向的偏移個數，大小為特徵圖的height
shift_y = np.arange(0, height) * self._feat_stride
# shift_x，shift_y均為width×height的二維陣列（meshgrid生成），對應位置的元素組合即構成影象上需要偏移量大小
#（偏移量大小是相對與影象最左上角的那9個anchor的偏移量大小），也就是說總共會得到width×height×9個偏移值對。
# 這些偏移值對與初始的anchor相加即可得到
# 所有的anchors，所以總共會產生width×height×9個anchors，且儲存在all_anchors變數中
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                    shift_x.ravel(), shift_y.ravel())).transpose() # 維度輸出為(width*height)*4
# add A anchors (1, A, 4) to
# cell K shifts (K, 1, 4) to get
# shift anchors (K, A, 4)
# reshape to (K*A, 4) shifted anchors
A = self._num_anchors
K = shifts.shape[0] # K=width*height
# 在之前9個anchor的基礎上產生K*A個anchor，既是總的anchor數量
all_anchors = (self._anchors.reshape((1, A, 4)) +
               shifts.reshape((1, K, 4)).transpose((1, 0, 2)))
all_anchors = all_anchors.reshape((K * A, 4))
total_anchors = int(K * A) # 總的anchor數量

產生這麼多的anchor自然有一些超出了邊界，那麼就需要對其進行剔除

# only keep anchors inside the image 在影象內部的anchor，即是有效anchor，邊界之外的刪除掉
inds_inside = np.where(
    (all_anchors[:, 0] >= -self._allowed_border) &
    (all_anchors[:, 1] >= -self._allowed_border) &
    (all_anchors[:, 2] < im_info[1] + self._allowed_border) &  # width
    (all_anchors[:, 3] < im_info[0] + self._allowed_border)    # height
    )[0]

初始化可用anchor對應的lable，分類標籤的含義下面寫了

# label: 1 is positive, 0 is negative, -1 is dont care
# 影象內部anchor對應的分類，是否為目標的分類，大小為符合條件anchor的數量
labels = np.empty((len(inds_inside), ), dtype=np.float32)
labels.fill(-1)

在之前生成了計算需要的anchor了那麼接下來就是需要計算anchor與gt之間的關係了，也就是使用overlap area的面積來度量，每個anchor的是否為目標分類也是根據這個度量來設定的。

# overlaps between the anchors and the gt boxes
# overlaps (ex, gt)返回維度為【anchors * gt_boxes】大小的二維陣列
overlaps = bbox_overlaps(
    np.ascontiguousarray(anchors, dtype=np.float),
    np.ascontiguousarray(gt_boxes, dtype=np.float))
argmax_overlaps = overlaps.argmax(axis=1) # 求取於anchor重疊最大的gt
max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps] # 取出與每個anchor重疊最大gt的重疊面積
gt_argmax_overlaps = overlaps.argmax(axis=0) # 求出與每個gt重疊面積最大的anchor
gt_max_overlaps = overlaps[gt_argmax_overlaps,
                                   np.arange(overlaps.shape[1])] # 取出與每個gt重疊面積最大的
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]

# 重疊面積小於閾值0.3的標註為0
if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
    # assign bg labels first so that positive labels can clobber them
    labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

# fg label: for each gt, anchor with highest overlap 與gt圖重疊最大的對應anchor分類被設定為1
labels[gt_argmax_overlaps] = 1

# fg label: above threshold IOU 將與gt重疊的面積大於閾值0.7的anchor也將其分類設定為1
labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1

if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
    # assign bg labels last so that negative labels can clobber positives
    labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0

論文中說從所有anchor中隨機選取256個anchor，前景128個，背景128個。注意：那種label為-1的不會當前景也不會當背景。
下面這兩段程式碼是前一部分是在所有前景的anchor中選128個，後一部分是在所有的背景anchor中選128個。如果前景的個數少於了128個，就把所有的anchor選出來，差的由背景部分補。這和Fast RCNN選取ROI一樣。

# subsample positive labels if we have too many 要是執行到這裡得到的分類為1的太多了那就進行取樣
# 從所有label為1的anchor中選擇128個，剩下的anchor的label全部置為-1
num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE) # 取樣的閾值
fg_inds = np.where(labels == 1)[0]
if len(fg_inds) > num_fg:
    disable_inds = npr.choice(
        fg_inds, size=(len(fg_inds) - num_fg), replace=False)
    labels[disable_inds] = -1

# subsample negative labels if we have too many 要是被分類為非1的太多了那麼也要進行取樣
# 這裡num_bg不是直接設為128，而是256減去label為1的個數，這樣如果label為1的不夠，就用label為0的填充，這個程式碼實現很巧
num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)
bg_inds = np.where(labels == 0)[0]
if len(bg_inds) > num_bg:
    disable_inds = npr.choice(
        bg_inds, size=(len(bg_inds) - num_bg), replace=False)
    labels[disable_inds] = -1

論文中RPN的損失函式是這樣定義的：
在這裡插入圖片描述
這個loss函式和Fast RCNN中的loss函式差不多，所以在計算的時候是每個座標單獨進行smoothL1計算，所以引數 $Pi^*$ 和 $N_{reg}$ 必須弄成4維的向量，並不是在論文中的就一個數值。
bbox_inside_weights實際上指的就是 $Pi^*$ ，bbox_outside_weights指的是 $N_{reg}$ 。論文中說如果anchor是前景， $Pi^*$ 就是1，為背景， $Pi^*$ 就是0。label為-1的，在這個程式碼來看也是設定為0，應該是在後面不會參與計算，這個設定為多少都無所謂。
$N_{reg}$ 是進行標準化操作，就是取平均。這個平均是把所有的label 0和label 1加起來。因為選的是256個anchor做訓練，所以實際上這個值是 $\frac{1}{256}$ 。

bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32) # 之前anchor過濾之後與之對應的bbox
bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :]) # 計算anchor框與gt框之間的殘差用於迴歸

bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)

bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
# 對樣本權重進行歸一化
if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
    # uniform weighting of examples (given non-uniform sampling)
    num_examples = np.sum(labels >= 0)
    positive_weights = np.ones((1, 4)) * 1.0 / num_examples
    negative_weights = np.ones((1, 4)) * 1.0 / num_examples
else:
    assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
            (cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
    positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
                        np.sum(labels == 1))
    negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
                                np.sum(labels == 0))
bbox_outside_weights[labels == 1, :] = positive_weights
bbox_outside_weights[labels == 0, :] = negative_weights

之後將計算的anchor映射回原來的全部的anchor中去：

# map up to original set of anchors
# 主要是將長度為len(inds_inside)的資料映射回長度total_anchors的資料，total_anchors=(width*height)×9
labels = _unmap(labels, total_anchors, inds_inside, fill=-1)
bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)
bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)
bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)

值得注意的是，rpn網路的訓練是256個anchor，128個positive，128個negative。但anchor_target_layer層的輸出並不是只有256個anchor的label和座標變換，而是所有的anchor。其中_unmap函式就很好體現了這一點。那訓練的時候怎麼實現訓練這256個呢？實際上，這一層的4個輸出，rpn_labels是需要輸出到rpn_loss_cls層，其他的3個輸出到rpn_loss_bbox，label實際上就是loss function前半部分中的 $Pi^*$ （即計算分類的loss），這是一個log loss，為-1的label是無法進行log計算的，剩下的0、1就直接計算，這一部分實現了256。loss function後半部分是計算bbox座標的loss， $Pi^*$ ，也就是bbox_inside_weights，論文中說了activated only for positive anchors，只有為正例的anchor才去計算座標的損失，這是 $Pi^*$ 是1，其他情況都是0。所以呢，只有那256個才真正改變了loss值，其他的都是0。

bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)

這段程式碼也體現了這個思想，所以這也實現了256。

最後就是維度轉換並設定這個層的4個輸出了

# labels
labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)
labels = labels.reshape((1, 1, A * height, width))
top[0].reshape(*labels.shape)
top[0].data[...] = labels

# bbox_targets
bbox_targets = bbox_targets \
    .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
top[1].reshape(*bbox_targets.shape)
top[1].data[...] = bbox_targets

# bbox_inside_weights
bbox_inside_weights = bbox_inside_weights \
    .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
assert bbox_inside_weights.shape[2] == height
assert bbox_inside_weights.shape[3] == width
top[2].reshape(*bbox_inside_weights.shape)
top[2].data[...] = bbox_inside_weights

# bbox_outside_weights
bbox_outside_weights = bbox_outside_weights \
    .reshape((1, height, width, A * 4)).transpose(0, 3, 1, 2)
assert bbox_outside_weights.shape[2] == height
assert bbox_outside_weights.shape[3] == width
top[3].reshape(*bbox_outside_weights.shape)
 top[3].data[...] = bbox_outside_weights

到這裡，由特徵圖與anchor生成anchor分類與預測框的流程梳理完了，接下來就是根據對該層輸出計算RPN部分的loss了。

**PS：**我們注意到，該層中沒有並沒有實現反向傳播，這是為毛啊？沒有給網路提供梯度。其實是因為這個層的輸入資訊rpn_cls_score就提供了一個長寬資訊就回家洗洗睡了，所以就沒必要傳遞梯度了。

3. ROI Proposal網路部分

3.1 ProposalLayer

該層有3個輸入：fg/bg anchors分類器結果rpn_cls_prob_reshape，對應的bbox reg的 $[dx(A)，dy(A)，dw(A)，dh(A)]$ 變換量rpn_bbox_pred，以及im_info；另外還有引數feat_stride=16。
縮排首先解釋im_info。對於一副任意大小影象，傳入Faster RCNN前首先reshape到固定 $M*N$ ， $im\_info=[M, N, scale\_factor]$ 則儲存了此次縮放的所有資訊。然後經過Conv Layers，經過4次pooling變為 $(M/16)*(N/16)$ 大小，其中 $feature\_stride=16$ 則儲存了該資訊。所有這些數值都是為了將proposal映射回原圖而設定的。
首先來看，該層的初始函式

def setup(self, bottom, top):
    # parse the layer parameter string, which must be valid YAML
    layer_params = yaml.load(self.param_str_)

    self._feat_stride = layer_params['feat_stride']
    anchor_scales = layer_params.get('scales', (8, 16, 32))
    self._anchors = generate_anchors(scales=np.array(anchor_scales)) # 產生預設的9個anchor
    self._num_anchors = self._anchors.shape[0]

    if DEBUG:
        print 'feat_stride: {}'.format(self._feat_stride)
        print 'anchors:'
        print self._anchors

    # rois blob: holds R regions of interest, each is a 5-tuple
    # (n, x1, y1, x2, y2) specifying an image batch index n and a
    # rectangle (x1, y1, x2, y2)
    top[0].reshape(1, 5)

    # scores blob: holds scores for R regions of interest
    if len(top) > 1:
        top[1].reshape(1, 1, 1, 1)

在進行前向運算之前，需要載入一些配置項：

cfg_key = str(self.phase) # either 'TRAIN' or 'TEST' 階段為train和test的時候nms的輸入輸出數目不一樣
# Number of top scoring boxes to keep before apply NMS to RPN proposals
# 對RPN接面果使用NMS之前需要保留的框
pre_nms_topN  = cfg[cfg_key].RPN_PRE_NMS_TOP_N # 12000
# Number of top scoring boxes to keep after applying NMS to RPN proposals
# 對RPN接面果使用NMS之後需要保留的框
post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N # 1200
## NMS threshold used on RPN proposals 使用nms時候的閾值
nms_thresh    = cfg[cfg_key].RPN_NMS_THRESH # 0.7
# Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale)
min_size      = cfg[cfg_key].RPN_MIN_SIZE # 16

# the first set of _num_anchors channels are bg probs
# the second set are the fg probs, which we want
# 前9個通道為背景類；後9個通道為非背景類
scores = bottom[0].data[:, self._num_anchors:, :, :] # 預測的分類（卷積輸出：18）
bbox_deltas = bottom[1].data # 預測框的偏移量
im_info = bottom[2].data[0, :] # 影象的資訊

接下來就開始proposal了
step1：再次生成anchor，並使用bbox_deltas得到預測框

# 1. Generate proposals from bbox deltas and shifted anchors
height, width = scores.shape[-2:]

if DEBUG:
    print 'score map size: {}'.format(scores.shape)

# Enumerate all shifts 這部分同anchor_target_layer
shift_x = np.arange(0, width) * self._feat_stride
shift_y = np.arange(0, height) * self._feat_stride
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
                    shift_x.ravel(), shift_y.ravel())).transpose()

# Enumerate all shifted anchors:
#
# add A anchors (1, A, 4) to
# cell K shifts (K, 1, 4) to get
# shift anchors (K, A, 4)
# reshape to (K*A, 4) shifted anchors
A = self._num_anchors
K = shifts.shape[0]
anchors = self._anchors.reshape((1, A, 4)) + \
                  shifts.reshape((1, K, 4)).transpose((1, 0, 2))
anchors = anchors.reshape((K * A, 4))

# Transpose and reshape predicted bbox transformations to get them
# into the same order as the anchors:
#
# bbox deltas will be (1, 4 * A, H, W) format
# transpose to (1, H, W, 4 * A)
# reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a)
# in slowest to fastest order
bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4))

# Same story for the scores:
#
# scores are (1, A, H, W) format
# transpose to (1, H, W, A)
# reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a)
scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1))

# Convert anchors into proposals via bbox transformations
# 利用 bbox_deltas 對anchors進行修正，得到proposals的預測位置，可以參考論文中公式
# 對於x,y使用線性變換，對於w,h使用exp
proposals = bbox_transform_inv(anchors, bbox_deltas)

step2：剪裁預測框使之在影象範圍之內

# 2. clip predicted boxes to image
# 剪裁預測框到影象的邊界內
proposals = clip_boxes(proposals, im_info[:2])

step3：去除小的預測框，閾值為16

# 3. remove predicted boxes with either height or width < threshold
# (NOTE: convert min_size to input image scale stored in im_info[2])
# 去除長寬小於16的預測框，因為進行過4次Pooling呀
keep = _filter_boxes(proposals, min_size * im_info[2])
proposals = proposals[keep, :]
scores = scores[keep]

step4：對於預測框的分數進行排序，並且取前N個送去NMS

# 4. sort all (proposal, score) pairs by score from highest to lowest
# 5. take top pre_nms_topN (e.g. 6000) 選出Top_N，後面再進行 NMS，見前面的設定
order = scores.ravel().argsort()[::-1]
if pre_nms_topN > 0:
    order = order[:pre_nms_topN]
proposals = proposals[order, :] # 保留了前pre_nms_topN個框的座標資訊
scores = scores[order] # 保留了前pre_nms_topN個框的分數資訊

step5：進行NMS並取前N個

# 6. apply nms (e.g. threshold = 0.7)
# 7. take after_nms_topN (e.g. 300)
# 8. return the top proposals (-> RoIs top) 對預測框進行nms
keep = nms(np.hstack((proposals, scores)), nms_thresh)
if post_nms_topN > 0:
    keep = keep[:post_nms_topN]
proposals = proposals[keep, :] # 對nms之後的預測框取前after_nms_topN個
scores = scores[keep]

step6：輸出結果

# Output rois blob
# Our RPN implementation only supports a single input image, so all
# batch inds are 0
batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
top[0].reshape(*(blob.shape))
top[0].data[...] = blob

# [Optional] output scores blob
if len(top) > 1:
    top[1].reshape(*(scores.shape))
    top[1].data[...] = scores

3.2 ProposalTargetLayer

這個層主要完成由RPN得到的預測框到對應分類的匹配，其中對每次訓練的預測框進行了限制（每次只處理32個目標預測框，總數的1/4），詳見_sample_rois函式。首先，得到分類的數目，並初始化輸出blob的shape

def setup(self, bottom, top):
    layer_params = yaml.load(self.param_str_)
    self._num_classes = layer_params['num_classes']

    # sampled rois (0, x1, y1, x2, y2)
    top[0].reshape(1, 5)
    # labels
    top[1].reshape(1, 1)
    # bbox_targets
    top[2].reshape(1, self._num_classes * 4)
    # bbox_inside_weights
    top[3].reshape(1, self._num_classes * 4)
    # bbox_outside_weights
    top[4].reshape(1, self._num_classes * 4)

前向傳播函式

def forward(self, bottom, top):
    # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
    # (i.e., rpn.proposal_layer.ProposalLayer), or any other source
    all_rois = bottom[0].data # RPN預測框，維度為[N,5]
    # GT boxes (x1, y1, x2, y2, label)
    # TODO(rbg): it's annoying that sometimes I have extra info before
    # and other times after box coordinates -- normalize to one format
    gt_boxes = bottom[1].data # GT資訊，維度[M,5]

    # Include ground-truth boxes in the set of candidate rois
    # 將ground truth框加入到待分類的框裡面(相當於增加正樣本個數)
    # all_rois輸出維度[N+M,5]，前一維表示是從RPN的輸出選出的框和ground truth框合在一起了
    zeros = np.zeros((gt_boxes.shape[0], 1), dtype=gt_boxes.dtype)
    all_rois = np.vstack(
        (all_rois, np.hstack((zeros, gt_boxes[:, :-1])))
    ) # 先在每個ground truth框前面插入0(這樣才能和N個從RPN的輸出選出的框對齊)，然後把ground truth框插在最後

    # Sanity check: single batch only
    assert np.all(all_rois[:, 0] == 0), \
        'Only single item batches are supported'

    num_images = 1
    rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images #cfg.TRAIN.BATCH_SIZE為128
    # cfg.TRAIN.FG_FRACTION為0.25，即在一次分類訓練中前景框只能有32個
    fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)

    # Sample rois with classification labels and bounding box regression
    # targets
    # _sample_rois選擇進行分類訓練的框，並求取他們類別和座標的ground truth和計算邊框損失loss時需要的bbox_inside_weights
    labels, rois, bbox_targets, bbox_inside_weights = _sample_rois(
        all_rois, gt_boxes, fg_rois_per_image,
        rois_per_image, self._num_classes)

    if DEBUG:
        print 'num fg: {}'.format((labels > 0).sum())
        print 'num bg: {}'.format((labels == 0).sum())
        self._count += 1
        self._fg_num += (labels > 0).sum()
        self._bg_num += (labels == 0).sum()
        print 'num fg avg: {}'.format(self._fg_num / self._count)
        print 'num bg avg: {}'.format(self._bg_num / self._count)
        print 'ratio: {:.3f}'.format(float(self._fg_num) / float(self._bg_num))

    # sampled rois  取樣之後最終保留的全部預測框
    top[0].reshape(*rois.shape)
    top[0].data[...] = rois

    # classification labels 預測框的分類
    top[1].reshape(*labels.shape)
    top[1].data[...] = labels

    # bbox_targets 預測框與GT的殘差
    top[2].reshape(*bbox_targets.shape)
    top[2].data[...] = bbox_targets

    # bbox_inside_weights
    top[3].reshape(*bbox_inside_weights.shape)
    top[3].data[...] = bbox_inside_weights

    # bbox_outside_weights
    top[4].reshape(*bbox_inside_weights.shape)
    top[4].data[...] = np.array(bbox_inside_weights > 0).astype(np.float32)

對預測框進行取樣並計算殘差，在GT上找到其對應的分類

def _sample_rois(all_rois, gt_boxes, fg_rois_per_image, rois_per_image, num_classes):
    """Generate a random sample of RoIs comprising foreground and background
    examples.
    """
    # overlaps: (rois x gt_boxes)
    # 計算所有roi和ground truth框之間的重合度
    # 只取座標資訊，roi中取第二到第五個數（因為補0了呀），ground truth框中取第一到第四個數
    overlaps = bbox_overlaps(
        np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),
        np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float))
    gt_assignment = overlaps.argmax(axis=1) # 對於每個roi，找到對應的gt_box座標 shape: [len(all_rois),]
    max_overlaps = overlaps.max(axis=1) # 對於每個roi，找到與gt_box重合的最大的overlap shape: [len(all_rois),]
    labels = gt_boxes[gt_assignment, 4] #對於每個roi，找到歸屬的類別: [len(all_rois),]

    # Select foreground RoIs as those with >= FG_THRESH overlap
    # 大於閾值的實際前景的數量
    fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
    # Guard against the case when an image has fewer than fg_rois_per_image
    # foreground RoIs 求取用於迴歸的前景框數量
    fg_rois_per_this_image = min(fg_rois_per_image, fg_inds.size)
    # Sample foreground regions without replacement
    # 如果需要的話，就隨機地排除一些前景框
    if fg_inds.size > 0:
        fg_inds = npr.choice(fg_inds, size=fg_rois_per_this_image, replace=False)

    # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
    # 找到屬於背景的rois(就是與gt_box覆蓋介於0和0.5之間的)
    bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
                       (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
    # Compute number of background RoIs to take from this image (guarding
    # against there being fewer than desired)
    bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image # 128-32個
    bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size) # 以下操作同fg
    # Sample background regions without replacement
    if bg_inds.size > 0:
        bg_inds = npr.choice(bg_inds, size=bg_rois_per_this_image, replace=False)

    # The indices that we're selecting (both fg and bg)
    keep_inds = np.append(fg_inds, bg_inds) # 記錄一下運算之後最終保留的框
    # Select sampled values from various arrays:
    labels = labels[keep_inds]  # 記錄一下最終保留的框對應的label
    # Clamp labels for the background RoIs to 0
    labels[fg_rois_per_this_image:] = 0 # 把背景框的分類置0
    rois = all_rois[keep_inds] # 取出最終保留的rois

    # 得到最終保留的框的類別ground truth值和座標變換ground truth值，得到預測框的誤差
    bbox_target_data = _compute_targets(
        rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels)

    # 得到最終計算loss時使用的ground truth邊框迴歸值和bbox_inside_weights
    bbox_targets, bbox_inside_weights = \
        _get_bbox_regression_labels(bbox_target_data, num_classes)

    return labels, rois, bbox_targets, bbox_inside_weights

計算預測框殘差：

def _compute_targets(ex_rois, gt_rois, labels):
    """Compute bounding-box regression targets for an image."""

    assert ex_rois.shape[0] == gt_rois.shape[0]
    assert ex_rois.shape[1] == 4
    assert gt_rois.shape[1] == 4

    targets = bbox_transform(ex_rois, gt_rois) # 獲得預測框與gt的殘差
    if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED: # 是否需要進行歸一化
        # Optionally normalize targets by a precomputed mean and stdev
        targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
                / np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))
    # 將殘差插到lable的後面（水平插入）
    return np.hstack(
            (labels[:, np.newaxis], targets)).astype(np.float32, copy=False)

整理資料到需要的格式：

def _get_bbox_regression_labels(bbox_target_data, num_classes):
    """Bounding-box regression targets (bbox_target_data) are stored in a
    compact form N x (class, tx, ty, tw, th)

    This function expands those targets into the 4-of-4*K representation used
    by the network (i.e. only one class has non-zero targets).

    Returns:
        bbox_target (ndarray): N x 4K blob of regression targets
        bbox_inside_weights (ndarray): N x 4K blob of loss weights
    """

    clss = bbox_target_data[:, 0]  # 每個預測框通過重疊面積與gt比較得到的分類
    # 對應分類上預測框的誤差
    bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
    # 用全0初始化一下bbox_inside_weights
    bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
    inds = np.where(clss > 0)[0] # 非背景類
    for ind in inds:
        cls = clss[ind]
        start = 4 * cls # 找到從屬的類別對應的座標迴歸值的起始位置
        end = start + 4  # 找到從屬的類別對應的座標迴歸值的結束位置
        bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]  #在對應類的座標迴歸上置相應的值（預測框誤差）
        # 將bbox_inside_weights上的對應類的座標迴歸值置1
        bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS # (1.0, 1.0, 1.0, 1.0)
    return bbox_targets, bbox_inside_weights

4. ROI Pooling

這部分參考：
關於ROI Pooling Layer的解讀

RPN網路程式碼解讀

1. 說在前面的話

2. RPN網路部分

3. ROI Proposal網路部分

3.1 ProposalLayer

3.2 ProposalTargetLayer

4. ROI Pooling

5. REF

RPN網路程式碼解讀

tensorflow+faster rcnn程式碼理解（一）：構建vgg前端和RPN網路

神經網路中反向傳播演算法(backpropagation)的pytorch實現，pytorch教程中的程式碼解讀以及其他一些疑問與解答

第一個python小程式——即時動態時鐘(程式碼解讀)

【Python】改進Hopfield網路程式碼實現

Faster R-cnn中的RPN網路詳細解釋

RPN網路的smoothL1 loss

freeModbus程式碼解讀及移植筆記

網路程式碼錯誤

[arduino]-1-Basics程式碼解讀

colmap程式碼解讀

【PHP】靜態方法呼叫非靜態方法和靜態呼叫非靜態方法程式碼解讀

Python CNN卷積神經網路程式碼實現

Kubernetes(k8s)程式碼解讀

實現servlet介面並優化程式碼解讀

判斷網路程式碼

MaskRCNN RPN網路分析

莫煩Pytorch之快速構建網路程式碼

網路程式碼實現

Python製作神經網路程式碼

RPN網路程式碼解讀

1. 說在前面的話

2. RPN網路部分

3. ROI Proposal網路部分

3.1 ProposalLayer

3.2 ProposalTargetLayer

4. ROI Pooling

5. REF

相關推薦