關於ROI Pooling Layer的解讀

阿新 • • 發佈：2018-11-02

1. 原理介紹

目標檢測typical architecture 通常可以分為兩個階段：
（1）region proposal：給定一張輸入image找出objects可能存在的所有位置。這一階段的輸出應該是一系列object可能位置的bounding box。這些通常稱之為region proposals或者 regions of interest（ROI）。
（2）final classification：確定上一階段的每個region proposal是否屬於目標一類或者背景。

這個architecture存在的一些問題是：
產生大量的region proposals 會導致performance problems，很難達到實時目標檢測。在處理速度方面是suboptimal。無法做到end-to-end training。這就是ROI pooling提出的根本原因。
ROI pooling層能實現training和testing的顯著加速，並提高檢測accuracy。該層有兩個輸入：從具有多個卷積核池化的深度網路中獲得的固定大小的feature maps；一個表示所有ROI（也可以叫GT）的N*5的矩陣，其中N表示ROI的數目。第一列表示影象index，其餘四列表示其餘的左上角和右下角座標。

ROI pooling具體操作如下：
（1）根據輸入image，將ROI對映到feature map對應位置，對映是根據image縮小的尺寸來的；
（2）按照ROI Pooling輸出的資料的座標，將其對映到上一步中對映的feature區域上，這樣就將原來feature map上的ROI對映劃分成了幾個sections（sections數量與輸出的維度（pooled_w*pooled_h）相同）；
（3）對每個sections進行max pooling操作；
這樣我們就可以從不同大小的方框得到固定大小的相應的feature maps。值得一提的是，輸出的feature maps的大小不取決於ROI和卷積feature maps大小，而是取決於該層設定的pooled_h與pooled_w。ROI pooling 最大的好處就在於極大地提高了處理速度。這樣不管給定feature map輸入的大小，使得輸出的資料維度統一，這與SPP-Net的思想類似。

2. ROI pooling的圖文解釋

考慮一個 $8*8$ 大小的feature map，一個ROI，以及ROI Pooling之後的輸出大小為 $2 *$

2 2*2

2 * 2

（1）輸入的固定大小的feature map
在這裡插入圖片描述

（2）region proposal 投影之後位置（左上角，右下角座標）：（0，3），（7，8）。
在這裡插入圖片描述

（3）將其劃分為（

2*2

）個sections（因為輸出大小為

2*2

），我們可以得到：

（4）對每個section做max pooling，可以得到：
在這裡插入圖片描述

ROI pooling總結：
（1）用於目標檢測任務；
（2）允許我們對CNN中的feature map進行reuse；
（3）可以顯著加速training和testing速度；
（4）允許end-to-end的形式訓練目標檢測系統。

3. Caffe中的使用與實現

對於ROI Pooling層在Caffe的prototxt中是這樣定義的

layer {
  name: "roi_pool5"
  type: "ROIPooling"
  bottom: "conv5_3"
  bottom: "rois"
  top: "pool5"
  roi_pooling_param {
    pooled_w: 7
    pooled_h: 7
    spatial_scale: 0.0625 # 1/16
  }
}

對應的原始碼，這裡已經寫了必要的註釋

template <typename Dtype>
void ROIPoolingLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  ROIPoolingParameter roi_pool_param = this->layer_param_.roi_pooling_param();
  CHECK_GT(roi_pool_param.pooled_h(), 0)
      << "pooled_h must be > 0";
  CHECK_GT(roi_pool_param.pooled_w(), 0)
      << "pooled_w must be > 0";
  pooled_height_ = roi_pool_param.pooled_h();	//Pooling之後的height
  pooled_width_ = roi_pool_param.pooled_w();	//Pooling之後的width
  spatial_scale_ = roi_pool_param.spatial_scale();	//GT標註的縮放比例
  LOG(INFO) << "Spatial scale: " << spatial_scale_;
}

template <typename Dtype>
void ROIPoolingLayer<Dtype>::Reshape(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  channels_ = bottom[0]->channels();
  height_ = bottom[0]->height();
  width_ = bottom[0]->width();
  top[0]->Reshape(bottom[1]->num(), channels_, pooled_height_,	//輸出的維度是GT標註的n*channels*Pooling_w*Pooling_h
      pooled_width_);
  max_idx_.Reshape(bottom[1]->num(), channels_, pooled_height_,
      pooled_width_);
}

template <typename Dtype>
void ROIPoolingLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top) {
  const Dtype* bottom_data = bottom[0]->cpu_data();	//卷積的feature map資料
  const Dtype* bottom_rois = bottom[1]->cpu_data();	//標註的GT資料
  // Number of ROIs
  int num_rois = bottom[1]->num(); 		//標註資料的個數
  int batch_size = bottom[0]->num(); 	//卷積資料
  int top_count = top[0]->count();		//輸出資料的大小
  Dtype* top_data = top[0]->mutable_cpu_data();		//空間初始化
  caffe_set(top_count, Dtype(-FLT_MAX), top_data);
  int* argmax_data = max_idx_.mutable_cpu_data();
  caffe_set(top_count, -1, argmax_data);

  // For each ROI R = [batch_index x1 y1 x2 y2]: max pool over R
  for (int n = 0; n < num_rois; ++n) {	//遍歷每個GT標註資料
    int roi_batch_ind = bottom_rois[0];	//取出GT座標對應當前batch中的index
    int roi_start_w = round(bottom_rois[1] * spatial_scale_);	//按照影象縮小的尺寸（scale），去計算對應座標在特徵圖上的相對位置
    int roi_start_h = round(bottom_rois[2] * spatial_scale_);
    int roi_end_w = round(bottom_rois[3] * spatial_scale_);
    int roi_end_h = round(bottom_rois[4] * spatial_scale_);
    CHECK_GE(roi_batch_ind, 0);
    CHECK_LT(roi_batch_ind, batch_size);

    int roi_height = max(roi_end_h - roi_start_h + 1, 1); //計算特徵圖上roi的寬高
    int roi_width = max(roi_end_w - roi_start_w + 1, 1);
    const Dtype bin_size_h = static_cast<Dtype>(roi_height)	//計算roi在特徵圖上的寬高與Pooling之後的寬高的比例
                             / static_cast<Dtype>(pooled_height_);
    const Dtype bin_size_w = static_cast<Dtype>(roi_width)
                             / static_cast<Dtype>(pooled_width_);

    const Dtype* batch_data = bottom_data + bottom[0]->offset(roi_batch_ind);	//取出正在運算的batch

	//使用當前GT對應的Pooling結果位置反向到feature map中去做求最大值操作
    for (int c = 0; c < channels_; ++c) {
      for (int ph = 0; ph < pooled_height_; ++ph) {
        for (int pw = 0; pw < pooled_width_; ++pw) {
          // Compute pooling region for this output unit:
          //  start (included) = floor(ph * roi_height / pooled_height_)
          //  end (excluded) = ceil((ph + 1) * roi_height / pooled_height_)
          int hstart = static_cast<int>(floor(static_cast<Dtype>(ph)	//計算Pooling之後的資料在
                                              * bin_size_h));
          int wstart = static_cast<int>(floor(static_cast<Dtype>(pw)
                                              * bin_size_w));
          int hend = static_cast<int>(ceil(static_cast<Dtype>(ph + 1)
                                           * bin_size_h));
          int wend = static_cast<int>(ceil(static_cast<Dtype>(pw + 1)
                                           * bin_size_w));

          hstart = min(max(hstart + roi_start_h, 0), height_);	//計算當前Pooling位置對應feature map的區域
          hend = min(max(hend + roi_start_h, 0), height_);
          wstart = min(max(wstart + roi_start_w, 0), width_);
          wend = min(max(wend + roi_start_w, 0), width_);

          bool is_empty = (hend <= hstart) || (wend <= wstart);

          const int pool_index = ph * pooled_width_ + pw;
          if (is_empty) {
            top_data[pool_index] = 0;
            argmax_data[pool_index] = -1;
          }

          for (int h = hstart; h < hend; ++h) {	//求出圈定區域的最大值
            for (int w = wstart; w < wend; ++w) {
              const int index = h * width_ + w;
              if (batch_data[index] > top_data[pool_index]) {
                top_data[pool_index] = batch_data[index];
                argmax_data[pool_index] = index;
              }
            }
          }
        }
      }
      // Increment all data pointers by one channel
      batch_data += bottom[0]->offset(0, 1); //當前batch的下一個channel
      top_data += top[0]->offset(0, 1); 	 //當前Pooling的下一個channel
      argmax_data += max_idx_.offset(0, 1);
    }
    // Increment ROI data pointer
    bottom_rois += bottom[1]->offset(1);	//下一個roi區域
  }
}

4. 參考

ROI Pooling層詳解

關於ROI Pooling Layer的解讀

1. 原理介紹

2. ROI pooling的圖文解釋

3. Caffe中的使用與實現

4. 參考

關於ROI Pooling Layer的解讀

Pytorch中RoI pooling layer的幾種實現

roi pooling層

segnet caffe upsample top index 0 out of range - check scale settings match input pooling layer's downsample setup

ROI Pooling層解析

關於RoI pooling 層

R-FCN 與 Position Sensitive ROI Pooling

3分鐘理解ROI Pooling層

ROI Pooling層詳解

ROI Pooling層解析_程式碼原理詳細解釋和存在目的

Caffe層系列：Pooling Layer

ROI Pooling原理及實現

YOLO原始碼（Darknet原始碼）解讀（layer.c）

CNN（卷積層convolutional layer，激勵層activating layer，池化層pooling，全連線層fully connected）

Caffe原始碼解讀：Layer類

OPENCV中定義ROI區域以及ROI區域的相關融合（程式解讀）

Caffe 程式碼解讀之全連線層concat layer

分針網——每日推薦：各路神仙關於閉包概念不同解讀

文件編碼解讀

分針網——每日分享：網站易用性的解讀及提高易用性幾個技巧

關於ROI Pooling Layer的解讀

1. 原理介紹

2. ROI pooling的圖文解釋

3. Caffe中的使用與實現

4. 參考

相關推薦