[CVPR 2016] Weakly Supervised Deep Detection Networks論文筆記
Weakly Supervised Deep Detection Networks,Hakan Bilen,Andrea Vedaldi
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Bilen_Weakly_Supervised_Deep_CVPR_2016_paper.pdf
亮點
- 把弱監督檢測問題解釋為proposal排序的問題,通過比較所有proposal的類別分數得到一個比較正確的排序,這種思想與檢測中評測標準的計算方法一致
相關工作
The MIL strategy results in a non-convex optimization problem; in practice, solvers tend to get stuck in local optima
such that the quality of the solution strongly depends on the initialization.
- developing various initialization strategies [19, 5, 32, 4]
- [19] propose a self-paced learning strategy
- [5] initialize object locations based on the objectness score.
- [4] propose a multi-fold split of the training data to escape local optima.
- on regularizing the optimization problem [31, 1].
- [31] apply Nesterov’s smoothing technique to the latent SVM formulation
- [1] propose a smoothed version of MIL that softly labels object instances instead of choosing the highest scoring ones.
- Another line of research in WSD is based on the idea of identifying the similarity between image parts.
- [31] propose a discriminative graph-based algorithm that selects a subset of windows such that each window is connected to its nearest neighbors in positive images.
- [32] extend this method to discover multiple co-occurring part configurations.
- [36] propose an iterative technique that applies a latent semantic clustering via latent Semantic Analysis (pLSA)
- [2] propose a formulation that jointly learns a discriminative model and enforces the similarity of the selected object regions via a discriminative convex clustering algorithm
方法
本文采用的方法非常簡單易懂,主要分為以下三部:
- 將特征和region proposal的結果輸入spatial pyramid pooling層,取出與區域相關的特征向量,並輸入兩個fc層
- 分類:fc層的輸出通過softmax分類器,計算出這一區域類別
- 檢測:fc層的輸出通過softmax分類器,與上面不同的是歸一化的時候不是用類別歸一化,而是用所有區域的分數進行歸一化,通過區域之間的對比找到包含該類別信息最多的區域
- 某區域r屬於某類別c的得分,為後兩部分的積
- 全圖的類別得分,為所有區域屬於該類別的得分之和
訓練的loss function如下
最後一項是一個校準項(按照理解輕微更改了,感覺論文notation有點問題),其目的是通過拉近feature的距離約束解的平滑性(即與正確解相近的proposal也應該得到高分)。
實驗結果
本文根據basenet不同給出了4種model:S (VGG-F), M (VGG-M-1024), L (VGG-VD16)和Ens(前三種ensemble的模型)
- Ablation:
- Object proposal
- Baseline mAP: Selective Search S 31.1%, M 30.9%, L 24.3%, Ens. 33.3%
- Edge Box: +0~1.2%
- Edge Box + Edge Box Score: +1.8~5.9%
- Spatial regulariser (compared with Edge Box + Edge Box Score) mAP +1.2~4.4%
- VOC2007
- mAP on test: S +2.9%, M +3.3%, L +3.2%, Ens. +7.7% compared with [36] + context
- CorLoc on trainval: S +5.7%, M +7.6%, L +5%, Ens. +9.5% compared with [36]
- Classification AP on test: S +7.9% compared with VGG-F, M +6.5% compared with VGG-M-1024, L +0.4% compared with VGG-VD16, Ens. -0.3% compared with VGG-VD16
- VOC2010
- mAP on test: +8.8% compared with [4]
- CorLoc on trainval: +4.5% compared with [4]
缺點
本文有一個明顯的缺點是只考慮了一張圖中某類別物體只出現一次的情況(regulariser中僅限制了最大值及其周圍的框),這一點在文中給出的failure cases中也有所體現。
[CVPR 2016] Weakly Supervised Deep Detection Networks論文筆記