Learning local feature descriptors with triplets and shallow convolutional neural networks 論文閱讀筆記
題目翻譯:學習 local feature descriptors 使用 triplets 還有淺的卷積神經網路。讀罷此文,只覺收穫滿滿,同時另外印象最深的也是一個淺(文章中會提及)字。
1 Contribution
這篇論文主要做的貢獻有:
- 提出了一種複雜度更小的triplets,更淺,計算度複雜小,表現也很好。
- 並且藉助一種 in-triplet mining的訓練方法,降低了挖掘hard negatives的複雜度提高了表現。
- 論文還介紹了兩種不同的loss function在不同的任務下的表現。
下面將圍繞這些貢獻展開說明:
2 Learning with pairs
這一小節作者介紹了一下孿生神經網路的訓練方法。
\(\ell=1\)
3 Learning with triplets
我們假設取樣有\(\{a,p,n\}\),\(a\)和\(p\)來自同一個關鍵點的不同視角,\(a\)和\(n\)則來自不同的關鍵點,那麼訓練的目的是儘量使得\(a\)和\(p\)得到的特徵描述更近,\(a\)和\(n\)得到的特徵描述更遠。因此我們可以定義\(\delta_{+}=\|f(\boldsymbol{a})-f(\boldsymbol{p})\|_{2}\)
3.1 Two loss functions
-
Margin ranking loss
\[\lambda\left(\delta_{+}, \delta_{-}\right)=\max \left(0, \mu+\delta_{+}-\delta_{-}\right) \]我們可以觀察到,當\(\delta_{-}>\delta_{+}+\mu\)時,\(loss>0\),模型得到訓練。
-
Ratio loss
模型得到訓練當 \(\frac{\delta_{-}}{\delta_{+}} \rightarrow \infty\).訓練目標是儘可能讓 \(\left(\frac{e^{\delta_{+}}}{e^{\delta_{+}+} e^{\delta_{-}}}\right)^{2}\) to 0 , and \(\left(\frac{e^{\delta_{-}}}{e^{\delta++e^{\delta}}}\right)^{2}\) to 1。
3.2 In-triplet hard negative mining with anchor swap
這篇論文的第一個令人拍手稱快的點在這裡!
類似的思想對Ratio loss同樣適用。
3.3 Implementation details
這一小節主要介紹了,訓練上的一些細節,模型結構很簡單。
同時引用原文裡的一句話,闡述了為何把模型設定的儘量簡單。
Our motivation for such shallow network is to develop a descriptor for practical applications including those requiring real time processing. This is a challenging goal given that all previously introduced descriptors are computationally very intensive, thus impractical for most applications.
4 Experimental evaluation
這一節作者介紹了從兩個方面評估模型的方法,一個是 ROC curves,另一個是mean average precision,剛開始不知道這兩個指標是怎麼來的,做什麼的,查閱了參考小節裡的文章,有了一個大致的認識,關於這兩種評估方法的一些介紹引用原文:
The evaluation is done with two different evaluation metrics frequently found in the literature, patch pair classification success in terms of ROC curves [22], and mean average precision in terms of correct matching of feature points between pairs of images [16]. Note that these two metrics are of very different nature,the former measures how succesfull a classification of positive and negative patch pairs is, and the latter is evaluating the performance of a descriptor in nearest neighbour matching scenario where the task is to find correspondences in two large sets of descriptors.
4.1 Patch pair classification
可以看到在相關資料集上的FPR95指數,TFeat(論文模型的名字)要表現更好:
4.2 Nearest neighbour patch matching
這一小節作者介紹了結合數據集的一些取樣方法來計算precision-recall cruves,目前只知道這個指標大體是怎麼回事,具體是怎麼實施的還沒有深入瞭解。
-
Ratio loss vs. margin loss
- 大致可以發現map值的變化隨epoch的變化是比較緩慢的。
- radio loss 隨著訓練在Nearest neighbour patch matching上表現會**越來越差**
- 問:那這樣說的話,Ratio loss除了在起點處略優於margin loss,在什麼方面會比margin loss好呢?
- Image transformations
This shows that synthetic deformations are less challenging for descriptors than some real-world changes as the ones found in Oxford dataset.
5 Efficiency
Tfeat,體量更小,運算更快,效果更好。
6 Summary
- 提出了一個體量更小的模型,同時設計了一個方法使得訓練結果更好
- 闡述 ratio-loss based methods 更適合 patch pair classification, margin-loss based methods 在 nearest neighbour matching 表現更好。這裡我懷疑是作者第一句說錯了,因為在ratio-loss的在patch pair classification 測試結果(4.1 Patch pair classification)上,並沒有比 margin-loss好,事實上,這篇論文裡我沒有找到地方證明ratio-loss在哪裡優於margin-loss.....
- a good performance on patch classification does not necessarily generalise to a good performance in nearest neighbour based frameworks.
Refer
[1] TPR FPR ROC AUC:https://zhuanlan.zhihu.com/p/100059009
[2] FPR95:https://stats.stackexchange.com/questions/481991/false-positive-rate-at-k-recall
[3] MAP:https://www.zhihu.com/question/53405779
[4] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. Discriminative learning of deep convolutional feature point descriptors. In ICCV, 2015.