1. 程式人生 > 其它 >Learning local feature descriptors with triplets and shallow convolutional neural networks 論文閱讀筆記

Learning local feature descriptors with triplets and shallow convolutional neural networks 論文閱讀筆記

Learning local feature descriptors with triplets and shallow convolutional neural networks

題目翻譯:學習 local feature descriptors 使用 triplets 還有的卷積神經網路。讀罷此文,只覺收穫滿滿,同時另外印象最深的也是一個(文章中會提及)字。

1 Contribution

這篇論文主要做的貢獻有:

  • 提出了一種複雜度更小的triplets,更淺,計算度複雜小,表現也很好。
  • 並且藉助一種 in-triplet mining的訓練方法,降低了挖掘hard negatives的複雜度提高了表現。
  • 論文還介紹了兩種不同的loss function在不同的任務下的表現。

下面將圍繞這些貢獻展開說明:

2 Learning with pairs

這一小節作者介紹了一下孿生神經網路的訓練方法。

\[l\left(\boldsymbol{x}_{1}, \boldsymbol{x}_{2} ; \ell\right)= \begin{cases}\left\|f\left(\boldsymbol{x}_{1}\right)-f\left(\boldsymbol{x}_{2}\right)\right\|_{2} & \text { if } \ell=1 \\ \max \left(0, \mu-\left\|f\left(\boldsymbol{x}_{1}\right)-f\left(\boldsymbol{x}_{2}\right)\right\|_{2}\right) & \text { if } \ell=-1\end{cases} \]

\(\ell=1\)

代表\(x_1,x_2\)是positive pairs,反之則是negative pairs。同時當模型訓練到一定程度,negative pairs所產生的loss就是0了,對模型的訓練不起作用,因此之前[4]提出了mining hard negatives的方法來應對,具體可見我的上一篇博文,同時這種方法代價很高。

3 Learning with triplets

我們假設取樣有\(\{a,p,n\}\)\(a\)\(p\)來自同一個關鍵點的不同視角,\(a\)\(n\)則來自不同的關鍵點,那麼訓練的目的是儘量使得\(a\)\(p\)得到的特徵描述更近,\(a\)\(n\)得到的特徵描述更遠。因此我們可以定義\(\delta_{+}=\|f(\boldsymbol{a})-f(\boldsymbol{p})\|_{2}\)

and \(\delta_{-}=\|f(\boldsymbol{a})-f(\boldsymbol{n})\|_{2}\)

3.1 Two loss functions

  • Margin ranking loss

    \[\lambda\left(\delta_{+}, \delta_{-}\right)=\max \left(0, \mu+\delta_{+}-\delta_{-}\right) \]

    我們可以觀察到,當\(\delta_{-}>\delta_{+}+\mu\)時,\(loss>0\),模型得到訓練。

  • Ratio loss

\[\hat{\lambda}\left(\delta_{+}, \delta_{-}\right)=\left(\frac{e^{\delta_{+}}}{e^{\delta_{+}}+e^{\delta_{-}}}\right)^{2}+\left(1-\frac{e^{\delta_{-}}}{e^{\delta_{+}}+e^{\delta_{-}}}\right)^{2} \]

​ 模型得到訓練當 \(\frac{\delta_{-}}{\delta_{+}} \rightarrow \infty\).訓練目標是儘可能讓 \(\left(\frac{e^{\delta_{+}}}{e^{\delta_{+}+} e^{\delta_{-}}}\right)^{2}\) to 0 , and \(\left(\frac{e^{\delta_{-}}}{e^{\delta++e^{\delta}}}\right)^{2}\) to 1。

3.2 In-triplet hard negative mining with anchor swap

這篇論文的第一個令人拍手稱快的點在這裡!

類似的思想對Ratio loss同樣適用。

3.3 Implementation details

這一小節主要介紹了,訓練上的一些細節,模型結構很簡單。

同時引用原文裡的一句話,闡述了為何把模型設定的儘量簡單。

Our motivation for such shallow network is to develop a descriptor for practical applications including those requiring real time processing. This is a challenging goal given that all previously introduced descriptors are computationally very intensive, thus impractical for most applications.

4 Experimental evaluation

這一節作者介紹了從兩個方面評估模型的方法,一個是 ROC curves,另一個是mean average precision,剛開始不知道這兩個指標是怎麼來的,做什麼的,查閱了參考小節裡的文章,有了一個大致的認識,關於這兩種評估方法的一些介紹引用原文:

The evaluation is done with two different evaluation metrics frequently found in the literature, patch pair classification success in terms of ROC curves [22], and mean average precision in terms of correct matching of feature points between pairs of images [16]. Note that these two metrics are of very different nature,the former measures how succesfull a classification of positive and negative patch pairs is, and the latter is evaluating the performance of a descriptor in nearest neighbour matching scenario where the task is to find correspondences in two large sets of descriptors.

4.1 Patch pair classification

可以看到在相關資料集上的FPR95指數,TFeat(論文模型的名字)要表現更好:

4.2 Nearest neighbour patch matching

這一小節作者介紹了結合數據集的一些取樣方法來計算precision-recall cruves目前只知道這個指標大體是怎麼回事,具體是怎麼實施的還沒有深入瞭解。

  • Ratio loss vs. margin loss

	-  大致可以發現map值的變化隨epoch的變化是比較緩慢的。

	- radio loss 隨著訓練在Nearest neighbour patch matching上表現會**越來越差**

	- 問:那這樣說的話,Ratio loss除了在起點處略優於margin loss,在什麼方面會比margin loss好呢?
  • Image transformations

This shows that synthetic deformations are less challenging for descriptors than some real-world changes as the ones found in Oxford dataset.

5 Efficiency

Tfeat,體量更小,運算更快,效果更好。

6 Summary

  • 提出了一個體量更小的模型,同時設計了一個方法使得訓練結果更好
  • 闡述 ratio-loss based methods 更適合 patch pair classification, margin-loss based methodsnearest neighbour matching 表現更好。這裡我懷疑是作者第一句說錯了,因為在ratio-loss的在patch pair classification 測試結果(4.1 Patch pair classification)上,並沒有比 margin-loss好,事實上,這篇論文裡我沒有找到地方證明ratio-loss在哪裡優於margin-loss.....
  • a good performance on patch classification does not necessarily generalise to a good performance in nearest neighbour based frameworks.

Refer

[1] TPR FPR ROC AUC:https://zhuanlan.zhihu.com/p/100059009
[2] FPR95:https://stats.stackexchange.com/questions/481991/false-positive-rate-at-k-recall
[3] MAP:https://www.zhihu.com/question/53405779
[4] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. Discriminative learning of deep convolutional feature point descriptors. In ICCV, 2015.