NIPS-2015

NIPS，全稱神經資訊處理系統大會(Conference and Workshop on Neural Information Processing Systems)，是一個關於機器學習和計算神經科學的國際會議。該會議固定在每年的12月舉行,由NIPS基金會主辦。NIPS是機器學習領域的頂級會議。在中國計算機學會的國際學術會議排名中，NIPS為人工智慧領域的A類會議。

SS慢，EdgeBoxes 雖然能達到 0.2 second per image（和檢測的時間差不多了），一個很直接的想法就是在 GPU上實現這些演算法，但是 re-implementation ignores the down-stream detection network and therefore misses important opportunities for sharing computation.

相關工作先介紹了 object proposal的情況，然後是 Deep Net works for object detection（主要是 RCNN， fast RCNN 和 OverFeat），個人感覺對RCNN 和 OverFeat 的總結很精闢

R-CNN mainly plays as a classifier, and it does not predict object bounds (except for refining by bounding box regression).

In the OverFeat method, a fully-connected layer is trained to predict the box coordinates for the localization task that assumes a single object.

4.1 RPN

Note： RPN is class-agnostic 【R-FCN】《R-FCN: Object Detection via Region-based Fully Convolutional Networks》

4.1.1 Anchors

共享卷積的最後一層，ZF有 5 layers（256 dimension），VGG 有13 layers（512 dimension），

2k中 2 是 object or not object，k是每個3*3的 sliding window 中 anchor數量， 4k 中的 4 是 bbox

ratios 和 scales 的威力如下：

Translation-Invariant anchors

相比與 MultiBox的方法，Faster RCNN的 anchor 基於卷積，有 translation-invariant 的性質，而且引數量更少，（4+2）* k * dimension（eg，k=9，VGG dimension為512） parameters 為 $2.8*10^4$ ，更少的引數量的好處是，less risk of overfitting on small datasets，like PASCAL VOC

Multi-Scale Anchors as Regression References

區別於 image pyramid 和 filter pyramid，作者用 anchor pyramid（不同的 scales 和 ratios），more cost-efficient，因為 only relies on images and feature maps of a single scales and uses filters（sliding windows on feature map）of a single size.

4.1.2 Loss Function

每個anchor進行2分類，object or not，positive 為 IoU>0.5或者max IoU，negative 為 IoU<0.3，其它的anchor對訓練來說沒有用

損失函式如下

$i$ ：minibatch 中 $i -th$ anchor
$p_i$ ：predicted probability of anchor $i$ being an object.
$p_i^*$ ：is 1 if the anchor is positive, 0 if the anchor is negative
$t_i$ ：4 parameterized coordinates of the predicted bounding box
$t_i^*$ ：ground-truth box associated with a positive anchor
$L_{cls}$ ：log loss
$L_{reg}$ ：Smooth L1 loss，前面乘以了 $p_i^*$ 表示 regression loss is activated only for positive anchors

Normalized by $N_{cls}$ 和 $N_{reg}$ （normalization is not required and could be simplified）， $\lambda$ 用來 balance parameters

$N_{cls}$ 設定為 mini-batch的大小，eg：256
$N_{reg}$ 設定為 numbers of anchor locations（~2400）
$\lambda$ 設定為 10，正好兩種損失55開

$\lambda$ 的影響如下，Insensitive

具體的 $t_i$ 和 $t_i^*$ 如下：

x，y 是 predict box 的中心，w 和 h 分別是寬和高
$x，x_a，x^*$ 分別表示 predict-box，anchor box 和 ground-truth box，y，h，w 的表示方法也一樣

This can be thought of as bounding-box regression from an anchor box to a nearby ground-truth box.說白了，就是計算（predict box 與 anchor 的偏差）和（ground-truth 與 anchor的偏差）的損失

Note：這裡的 bbox regression 不同於 Fast RCNN 和 SPPnet的，

Fast RCNN 和 SPPnet 的bbox regression： is performed on features pooled from arbitrarily sized RoIs, and the regression weights are shared by all region sizes.
Faster RCNN 此處的 bbox regression 是爭對 per scales 和 per ratios的，To account for varying sizes, a set of k bounding-box regressors are learned. Each regressor is responsible for one scale and one aspect ratio, and the k regressors do not share weights.

4.1.3 Training RPNs

randomly sampls 256 anchors，這樣會出現以下問題：but this will bias towards negative samples as they are dominate，所以我們按照1：1 的抽正負anchors，如果positive anchors不夠128，pad negative anchors

We randomly initialize all new layers by drawing weights from a zero-mean Gaussian distribution with standard deviation 0.01.

Both RPN and Fast R-CNN, trained independently, will modify their convolutional layers in different ways. We therefore need to develop a technique that allows for sharing convolutional layers between the two networks, rather than learning two separate networks.

三種訓練方法

Alternating training（論文中採用的方法）
Approximate joint training（效果會比交替訓練好一些）
Non-approximate joint training

作者用的是交替訓練，4-step Alternating Training

RPN（ImageNet 初始化，RPN and Fast RCNN not share prameters）
Fast RCNN（ImageNet 初始化，用RPN產生的proposal——替換掉SS產生的，訓練Fast RNN，not share）
用上一步的訓練好的引數，fine tuning RPN（share）
用重新訓練的RPN提出的proposal， fine tuning the unique layers of Fast RCNN 也就是 head 部分（share）

為什麼不一二三四，二二三四，換個姿勢，再來一次？
A similar alternating training can be run for more iterations, but we have observed negligible improvements.

4.3 Implementation Detais

Train and test 都是 single scales，reshape shorter side s = 600 pixels
Image pyramid ： trade off accuracy and speed（沒采用）
Anchors：scales， $128^2$ 、 $256^2$ 、 $512^2$ ，ratios： $1：1$ ， $2：1$ ， $1：2$ ，見表一，表中紅色的字型是預設的 anchors（2：1），表中列出來的是 bbox regression 之後的結果

訓練的時候，剔除 cross image boundaries （跨圖邊界）的anchors，測試的時候，clip（裁剪） to the image
RPN proposal 有很多overlap，我們用了非極大值抑制（NMS），iou設定為0.7，NMS does not harm the ultimate detection accuracy，但是減少了 proposal 的數量。論文中用 top-2000的proposal 進行 train。為什麼NMS overlap thresold 設定為0.7呢？

看上面這個圖，就是 $1：1$ ， $2：1（\sqrt2:\sqrt2/2）$ ， $1：2（\sqrt2/2:\sqrt2）$ 三種情況，假如 ground truth 和 1：1一樣大，那麼與 $2：1$ ， $1：2$ 的 IOU都為 $:\sqrt2/2$ ，這樣的話會導致同一目標產生兩種特徵圖，不利於網路的學習，所以把 IOU設定為0.7，儘量緩解這種情況（只是一種解釋喲）

5 Experiments

5.1 Ablation Experiments

1，2，3對比，3 更好，the fewer proposals also reduce the region-wise fully-connected layers’ cost（table 5可以看到）
3，4 對比，share 好
3，6 對比，RPN+fast RCNN 比 SS+ Fast RCNN 好，train test 的 proposal 不一樣
4，8 對比， NMS 影響不大
7，11差距不算大，9，11差距明顯，cls 排序很重要
6，12對比，reg 很重要

5.2 VOC 07/12 實驗結果

5.3 速度（ms）

5.4 recall-to-IoU

RPN 的 proposal 從 2000 drops 到 300 效果差不多

5.5 PK （one-stage overfeat）

5.6 COCO 上的結果

VGG 換成 ResNet， ensemble一下， COCO 2015 object detection 冠軍

結構圖

Note： reshape 是為了softmax操作，softmax操作中，第一維必須是類別數，類別如果是2，object or not，則是 class-agnostic ，如果類別是，比如 VOC 資料集，20+1類，則是 class-specific

【Faster RCNN】《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks》

目錄

1 Motivation

2 Innovation

3 Advantages

4 Methods

4.1 RPN

4.1.1 Anchors

4.1.2 Loss Function

4.1.3 Training RPNs

4.3 Implementation Detais

5 Experiments

5.1 Ablation Experiments

5.2 VOC 07/12 實驗結果

5.3 速度（ms）

5.4 recall-to-IoU

5.5 PK （one-stage overfeat）

5.6 COCO 上的結果

結構圖

【Faster RCNN】《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks》

【論文筆記】Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

【筆記】Faster-R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

【翻譯】Faster-R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

論文閱讀筆記（六）Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

論文閱讀筆記二十六：Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks（CVPR 2016）

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

[論文學習]《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 》

深度學習論文翻譯解析（十三）：Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster RCNN: Towards RealTime Object Detection with Region Proposal Networks+Visualizing and Underst

【目標檢測】Cascade R-CNN 論文解析

【論文解析】Cascade R-CNN: Delving into High Quality Object Detection

【論文翻譯】Fast R-CNN

【論文翻譯】Mask R-CNN

【神經網路與深度學習】【計算機視覺】Fast R-CNN

【目標檢測】【語義分割】—Mask-R-CNN詳解

【論文閱讀】Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

【學習筆記】pyQt5學習筆記(6）——Google object detection API訓練&識別用軟體更新

【學習筆記】pyQt5學習筆記(5）——Google object detection API訓練用軟體

【目標檢測】[論文閱讀][yolo] You Only Look Once: Unified, Real-Time Object Detection

【Faster RCNN】《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks》

目錄

1 Motivation

2 Innovation

3 Advantages

4 Methods

4.1 RPN

4.1.1 Anchors

4.1.2 Loss Function

4.1.3 Training RPNs

4.2 Sharing Features for RPN and Fast R-CNN

4.3 Implementation Detais

5 Experiments

5.1 Ablation Experiments

5.2 VOC 07/12 實驗結果

5.3 速度（ms）

5.4 recall-to-IoU

5.5 PK （one-stage overfeat）

5.6 COCO 上的結果

結構圖

相關推薦