Tensorflow object detection API 原始碼閱讀筆記：RPN

阿新 • • 發佈：2019-01-03

Update:
建議先看從程式設計實現角度學習Faster R-CNN，比較直觀。這裡由於原始碼抽象程度較高，顯得比較混亂。

faster_rcnn_meta_arch.py中這兩個對應知乎文章中RPN包含的3*3和1*1卷積：
rpn_box_predictor_features = slim.conv2d(rpn_features_to_crop
self._first_stage_box_predictor=box_predictor.ConvolutionalBoxPredictor
知乎文章中的AnchorTargetCreator按照IoU將20000多個候選的anchor選出256個anchor進行分類和迴歸位置（計算RPN loss），對應：
target_assigner.batch_assign_targets；
self._first_stage_sampler=sampler.BalancedPositiveNegativeSampler，作用在first_stage_minibatch_size；
其中20000是RPN輸入的feature map大小和anchor的種類決定的，256對應first_stage_minibatch_size（見protos/faster_rcnn.proto）；
總之就是在def _loss_rpn。
（proposal=2000）知乎文章中的ProposalCreator：在RPN中，從上萬個anchor中，按照概率選擇一定數目（如12000/6000），並調整大小和位置，經過NMS，選出概率最大的2000/300個，生成RoIs，對應：
def _postprocess_rpn
first_stage_max_proposals=300
知乎文章中ProposalTargetCreator從2000/300候選中選擇一部分(比如128個)pooling出來用以訓練Fast R-CNN，對應：
不使用hard_example_miner
_unpad_proposals_and_sample_box_classifier_batch
second_stage_batch_size

=64

Old:

'''RPN概況。注意，分析時很多術語直接採用了原始論文中的表述，和程式碼中不一樣。
'''
FasterRCNNFeatureExtractor.extract_proposal_features實際呼叫的是
FasterRCNNResnetV1FeatureExtractor._extract_proposal_features
生成first stage RPN features作為RPN的輸入。

class FasterRCNNMetaArch(model.DetectionModel)的_extract_rpn_feature_map:
呼叫上述特徵提取器的_extract_proposal_features，並且返回
      rpn_box_predictor_features: A 4 
-D float32 tensor with shape
        [batch, height, width, depth] to be used for predicting proposal boxes
        and corresponding objectness scores.'''sliding window得到的intermediate layer'''
      rpn_features_to_crop: A 4-D float32 tensor with shape
        [batch, height, width, depth] representing image features to crop using
        the proposals boxes. '''其實就是前面特徵提取器得到的feature map。'''
      anchors: A BoxList representing anchors (for the RPN) in
        absolute coordinates.

'''這裡使用了grid_anchor_generator.GridAnchorGenerator，生成9個anchor boxes(3 different scales and 3 aspect ratios)。具體見下文分析。
'''
    anchors = self._first_stage_anchor_generator.generate(
        [(feature_map_shape[1], feature_map_shape[2])]

'''sliding window 作用於 conv feature map，得到intermediate layer. first_stage_box_predictor_kernel_size: Kernel size to use for the convolution op just prior to RPN box predictions.
'''
    with slim.arg_scope(self._first_stage_box_predictor_arg_scope):
      kernel_size = self._first_stage_box_predictor_kernel_size
      rpn_box_predictor_features = slim.conv2d(
          rpn_features_to_crop,
          self._first_stage_box_predictor_depth,
          kernel_size=[kernel_size, kernel_size],
          rate=self._first_stage_atrous_rate,
          activation_fn=tf.nn.relu6)

'''按照論文下面應該是intermediate layer進入cls和reg layer
'''
def _predict_rpn_proposals(self, rpn_box_predictor_features):
進入
self._first_stage_box_predictor.predict
self._first_stage_box_predictor = box_predictor.ConvolutionalBoxPredictor

'''Box predictors are classes that take a high level image feature map as input and produce two predictions, (1) a tensor encoding box locations, and (2) a tensor encoding classes for each box. 下文具體看class ConvolutionalBoxPredictor(BoxPredictor)
'''

'''再進一步是進loss了。這個predict函式可以同時返回兩個階段的一個prediction_dict，然後進loss。
'''
def predict(self, preprocessed_inputs)
def loss(self, prediction_dict, scope=None):

'''loss中呼叫了第一階段loss的計算。下文細看。
'''
def _loss_rpn

'''anchor生成
object_detection/anchor_generators/grid_anchor_generator.py
'''
def _generate #通過父類core/anchor_generator.py的generate函式呼叫
    grid_height, grid_width = feature_map_shape_list[0]
    # Multidimensional analog of numpy.meshgrid
    scales_grid, aspect_ratios_grid = ops.meshgrid(self._scales,
                                                   self._aspect_ratios)
    scales_grid = tf.reshape(scales_grid, [-1])
    aspect_ratios_grid = tf.reshape(aspect_ratios_grid, [-1])
    return tile_anchors(grid_height,
                        grid_width,
                        scales_grid,
                        aspect_ratios_grid,
                        self._base_anchor_size,
                        self._anchor_stride,
                        self._anchor_offset)
'''去test指令碼手算驗證一下。
    base_anchor_size = [10, 10]#default=[256, 256]
    anchor_stride = [19, 19]#default=[16, 16]
    anchor_offset = [0, 0]
    scales = [0.5, 1.0, 2.0]
    aspect_ratios = [1.0]

    exp_anchor_corners = [[-2.5, -2.5, 2.5, 2.5], [-5., -5., 5., 5.],
                          [-10., -10., 10., 10.], [-2.5, 16.5, 2.5, 21.5],
                          [-5., 14., 5, 24], [-10., 9., 10, 29],
                          [16.5, -2.5, 21.5, 2.5], [14., -5., 24, 5],
                          [9., -10., 29, 10], [16.5, 16.5, 21.5, 21.5],
                          [14., 14., 24, 24], [9., 9., 29, 29]]
    feature_map_shape_list=[(2, 2)] #asks for anchors that correspond
        to an 2x2 layer
grid_height, grid_width = 2,2
scales_grid, aspect_ratios_grid 略，三種組合，導致整個feature map一共獲得2*2*3=12個anchor
anchor的高和寬由scales, aspect_ratio和base_anchor_size決定，簡單。
anchor的中心由range(grid)，anchor_stride和anchor_offset決定，grid就是grid_height和grid_width。比如顯然第一個是0，第二個是19。理解grid就是指在feature map上產生anchor的格點，然後就很簡單。思考：base_anchor_size和anchor_stride等引數是怎麼配置的？
anchor的中心怎麼和sliding window的中心一致？答案是在輸入的feature map上每一個格子都生成anchor：
    feature_map_shape = tf.shape(rpn_features_to_crop)
    anchors = self._first_stage_anchor_generator.generate(
        [(feature_map_shape[1], feature_map_shape[2])])
由此可以得到anchor的stride應該是1×16=16（因為feature map的每個格點對應原圖的感受野是16*16）。符合預期。基本的想法就是feature map還原回去感受野很大，形狀單一，所以在每個格點引入了k種不同大小和形狀的anchor box，以便在原圖上更好的框住物體。這種思想在yolo2, ssd等paper中有進一步改進和擴充套件。
'''      
def tile_anchors

"""生成locations and classes
object_detection/core/box_predictor.py. 
作用於sliding window得到的intermediate layer，輸出是每個anchor的tensor encoding box locations和tensor encoding classes for each box.
可能會引入額外的卷積層。
另外位置學習也沒有什麼特別的，就是卷積：
        box_encodings = slim.conv2d(
            net, num_predictions_per_location * self._box_code_size,
            [self._kernel_size, self._kernel_size],
            scope='BoxEncodingPredictor')
num_predictions_per_location是anchor數。類別學習也類似：
        class_predictions_with_background = slim.conv2d(
            net, num_predictions_per_location * num_class_slots,
            [self._kernel_size, self._kernel_size], scope='ClassPredictor',
            biases_initializer=tf.constant_initializer(
                self._class_prediction_bias_init))
思考：這裡學習到的box location是啥東西？其實就是predict函式呼叫_predict_rpn_proposals函式得到的'rpn_box_encodings'.
    self._first_stage_box_predictor = box_predictor.ConvolutionalBoxPredictor
後面被用來計算Loss。和anchor本身的location是啥關係？它實際是論文中的predicted box與anchor box之間座標的差值。
"""
class ConvolutionalBoxPredictor(BoxPredictor)

"""cls, reg loss
object_detection/meta_architectures/faster_rcnn_meta_arch.py
這裡直接拿rpn_box_encodings計算loss了。推測batch_reg_targets是anchor box與ground truth的差值。查target_assigner.batch_assign_targets程式碼：
  def assign(self, anchors, groundtruth_boxes, groundtruth_labels=None,
             **params):
      reg_targets = self._create_regression_targets(anchors,
                                                    groundtruth_boxes,
                                                    match)
_create_regression_targets
        matched_reg_targets = self._box_coder.encode(matched_gt_boxes,
                                                 matched_anchors)
box_coders/faster_rcnn_box_coder.py
    tx = (xcenter - xcenter_a) / wa
    ty = (ycenter - ycenter_a) / ha
    tw = tf.log(w / wa)
    th = tf.log(h / ha) 
順藤摸瓜找到了，和論文中一致。                                                           
"""
def _loss_rpn
      (batch_cls_targets, batch_cls_weights, batch_reg_targets,
       batch_reg_weights, _) = target_assigner.batch_assign_targets(
           self._proposal_target_assigner, box_list.BoxList(anchors),
           groundtruth_boxlists, len(groundtruth_boxlists)*[None])
      batch_cls_targets = tf.squeeze(batch_cls_targets, axis=2)

      localization_losses = self._first_stage_localization_loss(
          rpn_box_encodings, batch_reg_targets, weights=sampled_reg_indices)

下面看loss計算公式具體是怎麼實現的。

"""The ground-truth label is 1 if the anchor is positive, and is 0 if the anchor is negative. 
An anchor is labeled as positive if:
(a) the anchor is the one with highest IoU overlap with a ground-truth box
(b) the anchor has an IoU overlap with a ground-truth box higher than 0.7
Negative labels are assigned to anchors with IoU lower than 0.3 for all ground-truth
boxes.
50%/50% ratio of positive/negative anchors in a minibatch.
"""
經過之前的分析，相應的程式碼應該是
      (batch_cls_targets, batch_cls_weights, batch_reg_targets,
       batch_reg_weights, _) = target_assigner.batch_assign_targets(
           self._proposal_target_assigner, box_list.BoxList(anchors),
           groundtruth_boxlists, len(groundtruth_boxlists)*[None])
這裡呼叫的target_assigner物件是這樣構建的：
    self._proposal_target_assigner = target_assigner.create_target_assigner(
        'FasterRCNN', 'proposal')
進入core/target_assigner.py中的create_target_assigner函式：
  elif reference == 'FasterRCNN' and stage == 'proposal':
    similarity_calc = sim_calc.IouSimilarity()
    matcher = argmax_matcher.ArgMaxMatcher(matched_threshold=0.7,
                                           unmatched_threshold=0.3,
                                           force_match_for_each_row=True)
    box_coder = faster_rcnn_box_coder.FasterRcnnBoxCoder(
        scale_factors=[10.0, 10.0, 5.0, 5.0])
具體實現在：       
from object_detection.matchers import argmax_matcher

Tensorflow object detection API 原始碼閱讀筆記：RPN

Update: 建議先看從程式設計實現角度學習Faster R-CNN，比較直觀。這裡由於原始碼抽象程度較高，顯得比較混亂。 faster_rcnn_meta_arch.py中這兩個對應知乎文章中RPN包含的3*3和1*1卷積： rpn_box_pred

Tensorflow object detection API 原始碼閱讀筆記：架構

在之前的博文中介紹過用tf提供的預訓練模型進行inference，非常簡單。這裡我們深入原始碼，瞭解檢測API的程式碼架構，每個部分的深入閱讀留待後續。 '''構建自己模型的介面是虛基類DetectionModel，具體有5個抽象函式需要實現。 ''' o

Tensorflow object detection API 原始碼閱讀筆記：RFCN

有了前面Faster R-CNN的基礎，RFCN就比較容易了。 """object_detection/meta_architectures/rfcn_meta_arch.py The R-FCN

初窺Tensorflow Object Detection API 原始碼之(1) FeatureExtractor

models/research/object_detection/models/faster_rcnn_resnet_v1_feature_extractor.py models/research/object_detection/meta_a

谷歌開源Tensorflow Object Detection API學習筆記

谷歌宣佈開源其內部使用的 TensorFlow Object Detection API 物體識別系統。本教程針對ubuntu16.04系統，快速搭建環境以及實現視訊物體識別系統功能。 https://yq.aliyun.com/ziliao/405237 https://www.cnblo

配置tensorflow object detection api

could ror blog test creat not pre setup.py python 3：安裝tensorflow model 以及slim 版本號為1.4以上的，model和slim均在research 文件夾下打開research文件目錄 python

谷歌開源的TensorFlow Object Detection API視頻物體識別系統實現教程

cti blog tail xiaoxiao pan clas post ont 谷歌教程：http://blog.csdn.net/xiaoxiao123jun/article/details/76605928 全部代碼：https://github.com/lyj83

#tensorflow object detection api 源碼分析

clas fas mask api 錯誤眼界沒有 lan 入門深度學習前言 Tensorflow 推出的 Object Detection API是一套抽象程度極高的目標檢測框架，可以快速用於生產部署。但網絡上大多數相關的中英文文章均只局限於應用層面的分析，對於該套

TensorFlow object detection API

storage 系統 pipeline -s doc 直接下載 and 獲取數據 ons https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_pet

TensorFlow object detection API應用一

ofo ash png figure lin 調用安裝包 pat eight 目標檢測在圖形識別的基礎上有了更進一步的應用，但是代碼也更加繁瑣，TensorFlow專門為此開設了一個object detection API，接下來看看怎麽使用它。一、object det

Ubuntu 16.04 安裝Tensorflow Object Detection API遇到的問題解決

** Ubuntu 16.04 安裝Tensorflow Object Detection API ** 本篇的內容主要參考以下連結：https://blog.csdn.net/pkokocl/article/details/82596089，該博主描述的比較清楚，對於解決實際

Tensorflow Object Detection API之MaskRCNN-資料處理篇

TensorFlow官網介紹：Run an Instance Segmentation Model 要求將資料處理為PNG Instance Segmentation Masks格式以下部分為處理單張Mask圖片的方式： from PIL import Image, ImageDr

Tensorflow object detection API--修改visualization_utils檔案,裁剪並儲存bounding box部分

任務描述：用Tensorflow object detection API檢測出來的結果是一整張圖片，想要把檢測出的bounding box部分單獨截取出來並儲存執行環境：spyder 效果展示：測試圖片：test_images --> 檢測圖片：testsave_images -

基於TensorFlow Object Detection API進行相關開發的步驟

1/安裝或升級protoc 2/編譯proto檔案 protoc object_detection/protos/*.proto --python_out=. 3將slim加入PYTHONPATH export PYTHONPATH="$PYTHONPATH:/home/user/DL

Tensorflow Object Detection API安裝與使用

一、簡介《21個專案玩轉深度學習：基於Tensorflow的實踐詳解》第五章實踐 win10、jupyter notebook、python3.6， Tensorflow Object Detection API專案地址：https://github.com/tensorflow/mo

Tensorflow object detection API(1)---環境搭建與測試

參考： https://blog.csdn.net/dy_guox/article/details/79081499 https://blog.csdn.net/u010103202/article/details/79899293 https://blog.csdn.n

windows+tensorflow object detection api 深度學習目標檢測實踐

1、在github上下載tensorflow/model專案 1. 首先把protoc-win32資料夾下面的protoc.exe移至protobuf-python/src目錄下。 2. 在cmd中進入protobuf-python/python目錄，先執行a

TensorFlow Object Detection API 超詳細教程和踩坑過程（安裝）

目錄 cuda安裝 cudnn安裝 anaconda安裝並建立環境 tensorflow環境 Tensorflow.models下載 Protobuf配置與測試 1.配置環境首先說一下我

TensorFlow Object Detection API 超詳細教程和踩坑過程（資料準備和訓練）

1.準備資料 object detection的資料是需要tfrecord格式的，但是一般我們還是先製作voc格式的資料更加方便。 1.voc格式資料的準備：github上下載一個label-img：然後選擇VOC格式，開始漫長的資料

基於谷歌開源的TensorFlow Object Detection API視訊物體識別系統實現教程

安裝Python 進入Python3.6.2下載頁，選擇 Files 中Windows平臺的Python安裝包，下載並安裝（本人安裝的是3.6.2版本的python，可根據實際情況下載不同版本的python）安裝TensorFlow 進入TensorFlow

Tensorflow object detection API 原始碼閱讀筆記：RPN

相關推薦