Learning local feature descriptors with triplets and shallow convolutional neural networks 論文閱讀筆記

阿新 • • 發佈：2021-12-12

Learning local feature descriptors with triplets and shallow convolutional neural networks

題目翻譯：學習 local feature descriptors 使用 triplets 還有淺的卷積神經網路。讀罷此文，只覺收穫滿滿，同時另外印象最深的也是一個淺（文章中會提及）字。

1 Contribution

這篇論文主要做的貢獻有：

提出了一種複雜度更小的triplets，更淺，計算度複雜小，表現也很好。
並且藉助一種 in-triplet mining的訓練方法，降低了挖掘hard negatives的複雜度提高了表現。
論文還介紹了兩種不同的loss function在不同的任務下的表現。

下面將圍繞這些貢獻展開說明：

2 Learning with pairs

這一小節作者介紹了一下孿生神經網路的訓練方法。

\[l\left(\boldsymbol{x}_{1}, \boldsymbol{x}_{2} ; \ell\right)= \begin{cases}\left\|f\left(\boldsymbol{x}_{1}\right)-f\left(\boldsymbol{x}_{2}\right)\right\|_{2} & \text { if } \ell=1 \\ \max \left(0, \mu-\left\|f\left(\boldsymbol{x}_{1}\right)-f\left(\boldsymbol{x}_{2}\right)\right\|_{2}\right) & \text { if } \ell=-1\end{cases} \]

\(\ell=1\)

代表\(x_1,x_2\)是positive pairs，反之則是negative pairs。同時當模型訓練到一定程度，negative pairs所產生的loss就是0了，對模型的訓練不起作用，因此之前[4]提出了mining hard negatives的方法來應對，具體可見我的上一篇博文，同時這種方法代價很高。

3 Learning with triplets

我們假設取樣有\(\{a,p,n\}\)，\(a\)和\(p\)來自同一個關鍵點的不同視角，\(a\)和\(n\)則來自不同的關鍵點，那麼訓練的目的是儘量使得\(a\)和\(p\)得到的特徵描述更近，\(a\)和\(n\)得到的特徵描述更遠。因此我們可以定義\(\delta_{+}=\|f(\boldsymbol{a})-f(\boldsymbol{p})\|_{2}\)

and \(\delta_{-}=\|f(\boldsymbol{a})-f(\boldsymbol{n})\|_{2}\)。

3.1 Two loss functions

Margin ranking loss
\[\lambda\left(\delta_{+}, \delta_{-}\right)=\max \left(0, \mu+\delta_{+}-\delta_{-}\right) \]
我們可以觀察到，當\(\delta_{-}>\delta_{+}+\mu\)時，\(loss>0\)，模型得到訓練。
Ratio loss

\[\hat{\lambda}\left(\delta_{+}, \delta_{-}\right)=\left(\frac{e^{\delta_{+}}}{e^{\delta_{+}}+e^{\delta_{-}}}\right)^{2}+\left(1-\frac{e^{\delta_{-}}}{e^{\delta_{+}}+e^{\delta_{-}}}\right)^{2} \]

模型得到訓練當 \(\frac{\delta_{-}}{\delta_{+}} \rightarrow \infty\).訓練目標是儘可能讓 \(\left(\frac{e^{\delta_{+}}}{e^{\delta_{+}+} e^{\delta_{-}}}\right)^{2}\) to 0 , and \(\left(\frac{e^{\delta_{-}}}{e^{\delta++e^{\delta}}}\right)^{2}\) to 1。

3.2 In-triplet hard negative mining with anchor swap

這篇論文的第一個令人拍手稱快的點在這裡！

類似的思想對Ratio loss同樣適用。

3.3 Implementation details

這一小節主要介紹了，訓練上的一些細節，模型結構很簡單。

同時引用原文裡的一句話，闡述了為何把模型設定的儘量簡單。

Our motivation for such shallow network is to develop a descriptor for practical applications including those requiring real time processing. This is a challenging goal given that all previously introduced descriptors are computationally very intensive, thus impractical for most applications.

4 Experimental evaluation

這一節作者介紹了從兩個方面評估模型的方法，一個是 ROC curves，另一個是mean average precision，剛開始不知道這兩個指標是怎麼來的，做什麼的，查閱了參考小節裡的文章，有了一個大致的認識，關於這兩種評估方法的一些介紹引用原文：

The evaluation is done with two different evaluation metrics frequently found in the literature, patch pair classiﬁcation success in terms of ROC curves [22], and mean average precision in terms of correct matching of feature points between pairs of images [16]. Note that these two metrics are of very different nature,the former measures how succesfull a classiﬁcation of positive and negative patch pairs is, and the latter is evaluating the performance of a descriptor in nearest neighbour matching scenario where the task is to ﬁnd correspondences in two large sets of descriptors.

4.1 Patch pair classiﬁcation

可以看到在相關資料集上的FPR95指數，TFeat（論文模型的名字）要表現更好：

4.2 Nearest neighbour patch matching

這一小節作者介紹了結合數據集的一些取樣方法來計算precision-recall cruves，目前只知道這個指標大體是怎麼回事，具體是怎麼實施的還沒有深入瞭解。

Ratio loss vs. margin loss

	-  大致可以發現map值的變化隨epoch的變化是比較緩慢的。

	- radio loss 隨著訓練在Nearest neighbour patch matching上表現會**越來越差**

	- 問：那這樣說的話，Ratio loss除了在起點處略優於margin loss，在什麼方面會比margin loss好呢？

Image transformations

This shows that synthetic deformations are less challenging for descriptors than some real-world changes as the ones found in Oxford dataset.

5 Efficiency

Tfeat，體量更小，運算更快，效果更好。

6 Summary

提出了一個體量更小的模型，同時設計了一個方法使得訓練結果更好
闡述 ratio-loss based methods 更適合 patch pair classiﬁcation, margin-loss based methods 在 nearest neighbour matching 表現更好。這裡我懷疑是作者第一句說錯了，因為在ratio-loss的在patch pair classiﬁcation 測試結果(4.1 Patch pair classiﬁcation)上，並沒有比 margin-loss好，事實上，這篇論文裡我沒有找到地方證明ratio-loss在哪裡優於margin-loss.....
a good performance on patch classiﬁcation does not necessarily generalise to a good performance in nearest neighbour based frameworks.

Refer

[1] TPR FPR ROC AUC：https://zhuanlan.zhihu.com/p/100059009
[2] FPR95：https://stats.stackexchange.com/questions/481991/false-positive-rate-at-k-recall
[3] MAP：https://www.zhihu.com/question/53405779
[4] E. Simo-Serra, E. Trulls, L. Ferraz, I. Kokkinos, P. Fua, and F. Moreno-Noguer. Discriminative learning of deep convolutional feature point descriptors. In ICCV, 2015.

Learning local feature descriptors with triplets and shallow convolutional neural networks 論文閱讀筆記

1 Contribution

2 Learning with pairs

3 Learning with triplets

3.1 Two loss functions

3.2 In-triplet hard negative mining with anchor swap

3.3 Implementation details

4 Experimental evaluation

4.1 Patch pair classiﬁcation

4.2 Nearest neighbour patch matching

5 Efficiency

6 Summary

Refer

Learning local feature descriptors with triplets and shallow convolutional neural networks 論文閱讀筆記

Discriminative Learning of Deep Convolutional Feature Point Descriptors 論文閱讀筆記

論文閱讀筆記《Deep Active Learning for Civil Infrastructure Defect Detection and Classification》

論文閱讀筆記《Distribution Consistency Based Covariance Metric Networks for Few-Shot Learning》

論文閱讀筆記《Few-Shot Learning Through an Information Retrieval Lens》

論文閱讀筆記5-An Asynchronous Energy-Efficient CNN Accelerator with Reconfigurable Architecture

EAST: An Efﬁcient and Accurate Scene Text Detector 論文閱讀

論文閱讀筆記《RelationNet2: Deep Comparison Columns for Few-Shot Learning》

論文閱讀筆記《Automatic Fabric Defect Detection with a Multi-Scale Convolutional Denoising Autoencoder Net》

AlexNet論文(ImageNet Classification with Deep Convolutional Neural Networks)學習筆記

細粒度相關 - Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks - 1 - 論文學習

論文閱讀筆記: Cyclical Learning Rates For Training Neural Networks

Visualizing and Understanding Convolutional Networks論文復現筆記

論文閱讀筆記An Adaptive Iterative Inpainting Method with More InformationExploration

《Non-local Neural Networks》論文閱讀筆記

Fast-adapting and Privacy-preserving Federated Recommender System閱讀筆記

FAIR: Quality-Aware Federated Learning with Precise User Incentive and Model Aggregation閱讀筆記

Crunching Numbers with AVX and AVX2

deep learning and neural networks--handwriting recongnization

圖神經網路論文閱讀(十六) GraLSP: Graph Neural Networks with Local Structural Patterns,AAAI 2020

Learning local feature descriptors with triplets and shallow convolutional neural networks 論文閱讀筆記

1 Contribution

2 Learning with pairs

3 Learning with triplets

3.1 Two loss functions

3.2 In-triplet hard negative mining with anchor swap

3.3 Implementation details

4 Experimental evaluation

4.1 Patch pair classiﬁcation

4.2 Nearest neighbour patch matching

5 Efficiency

6 Summary

Refer

相關推薦