Learning local feature descriptors with triplets and shallow convolutional neural networks

題目翻譯:學習 local feature descriptors 使用 triplets 還有的卷積神經網路。讀罷此文,只覺收穫滿滿,同時另外印象最深的也是一個(文章中會提及)字。

1 Contribution


  • 提出了一種複雜度更小的triplets,更淺,計算度複雜小,表現也很好。
  • 並且藉助一種 in-triplet mining的訓練方法,降低了挖掘hard negatives的複雜度提高了表現。
  • 論文還介紹了兩種不同的loss function在不同的任務下的表現。


2 Learning with pairs


\[l\left(\boldsymbol{x}_{1}, \boldsymbol{x}_{2} ; \ell\right)= \begin{cases}\left\|f\left(\boldsymbol{x}_{1}\right)-f\left(\boldsymbol{x}_{2}\right)\right\|_{2} & \text { if } \ell=1 \\ \max \left(0, \mu-\left\|f\left(\boldsymbol{x}_{1}\right)-f\left(\boldsymbol{x}_{2}\right)\right\|_{2}\right) & \text { if } \ell=-1\end{cases} \]


代表\(x_1,x_2\)是positive pairs,反之則是negative pairs。同時當模型訓練到一定程度,negative pairs所產生的loss就是0了,對模型的訓練不起作用,因此之前[4]提出了mining hard negatives的方法來應對,具體可見我的上一篇博文,同時這種方法代價很高。

3 Learning with triplets


and \(\delta_{-}=\|f(\boldsymbol{a})-f(\boldsymbol{n})\|_{2}\)

3.1 Two loss functions

  • Margin ranking loss

    \[\lambda\left(\delta_{+}, \delta_{-}\right)=\max \left(0, \mu+\delta_{+}-\delta_{-}\right) \]


  • Ratio loss

\[\hat{\lambda}\left(\delta_{+}, \delta_{-}\right)=\left(\frac{e^{\delta_{+}}}{e^{\delta_{+}}+e^{\delta_{-}}}\right)^{2}+\left(1-\frac{e^{\delta_{-}}}{e^{\delta_{+}}+e^{\delta_{-}}}\right)^{2} \]

​ 模型得到訓練當 \(\frac{\delta_{-}}{\delta_{+}} \rightarrow \infty\).訓練目標是儘可能讓 \(\left(\frac{e^{\delta_{+}}}{e^{\delta_{+}+} e^{\delta_{-}}}\right)^{2}\) to 0 , and \(\left(\frac{e^{\delta_{-}}}{e^{\delta++e^{\delta}}}\right)^{2}\) to 1。

3.2 In-triplet hard negative mining with anchor swap


類似的思想對Ratio loss同樣適用。

3.3 Implementation details



Our motivation for such shallow network is to develop a descriptor for practical applications including those requiring real time processing. This is a challenging goal given that all previously introduced descriptors are computationally very intensive, thus impractical for most applications.

4 Experimental evaluation

這一節作者介紹了從兩個方面評估模型的方法,一個是 ROC curves,另一個是mean average precision,剛開始不知道這兩個指標是怎麼來的,做什麼的,查閱了參考小節裡的文章,有了一個大致的認識,關於這兩種評估方法的一些介紹引用原文:

The evaluation is done with two different evaluation metrics frequently found in the literature, patch pair classification success in terms of ROC curves [22], and mean average precision in terms of correct matching of feature points between pairs of images [16]. Note that these two metrics are of very different nature,the former measures how succesfull a classification of positive and negative patch pairs is, and the latter is evaluating the performance of a descriptor in nearest neighbour matching scenario where the task is to find correspondences in two large sets of descriptors.

4.1 Patch pair classification


4.2 Nearest neighbour patch matching

這一小節作者介紹了結合數據集的一些取樣方法來計算precision-recall cruves目前只知道這個指標大體是怎麼回事,具體是怎麼實施的還沒有深入瞭解。

  • Ratio loss vs. margin loss

	-  大致可以發現map值的變化隨epoch的變化是比較緩慢的。

	- radio loss 隨著訓練在Nearest neighbour patch matching上表現會**越來越差**

	- 問:那這樣說的話,Ratio loss除了在起點處略優於margin loss,在什麼方面會比margin loss好呢?
  • Image transformations

This shows that synthetic deformations are less challenging for descriptors than some real-world changes as the ones found in Oxford dataset.

5 Efficiency


6 Summary

  • 提出了一個體量更小的模型,同時設計了一個方法使得訓練結果更好
  • 闡述 ratio-loss based methods 更適合 patch pair classification, margin-loss based methodsnearest neighbour matching 表現更好。這裡我懷疑是作者第一句說錯了,因為在ratio-loss的在patch pair classification 測試結果(4.1 Patch pair classification)上,並沒有比 margin-loss好,事實上,這篇論文裡我沒有找到地方證明ratio-loss在哪裡優於margin-loss.....
  • a good performance on patch classification does not necessarily generalise to a good performance in nearest neighbour based frameworks.


