Image Captioning with Context-Aware Auxiliary Guidance論文筆記

阿新 • • 發佈：2022-04-13

1、摘要

大多數編碼器-解碼器框架在預測當前詞時嚴重依賴之前生成的詞，這樣的方法不能有效地利用未來預測到的資訊去學習完整的語義。這篇文章提出了Context-Aware Auxiliary Guidance（CAAG）機制，它可以指導模型掌握全域性上下文的資訊，CAAG利用語義注意力有選擇性地關注全域性上下文資訊去生成當前的單詞。

2、介紹

最大似然估計的訓練方法會造成暴露偏差，後來的工作中提出了RL演算法來解決這個問題，它通過直接優化不可微序列級別的指標。

當前影象字幕的方法在進行當前詞預測的時候只依賴之前生成的單詞，這會導致模型沒有學到完全的語義資訊，而未來預測的單詞可能對當前預測的詞具有更多關鍵的資訊，所以在預測當前詞的時候應該也考慮未來預測詞的資訊。

為了更好地理解圖片資訊，提出了CAAG機制。首先使用字幕模型（稱為主要網路）生成一個完整的句子，被當做全域性的上下文。基於這個全域性上下文以及隱藏狀態，CAAG通過語義注意力重新生成目標詞，語義注意力可以幫助CAAG有選擇性地利用以前預測單詞的資訊或者未來預測單詞的資訊。

文章的主要貢獻：1.提出了CAAG機制，利用未來預測詞的資訊指導模型掌握更完全的語義資訊。

2.這個模型是通用的，可以與現在存在的基於強化學習的模型結合起來。

3.模型在COCO資料集上超過了很多SOTA模型。

3、架構

首先用在Visual Genome資料集上預訓練的Faster R-CNN抽取圖片上顯著區域的空間視覺特徵，然後基於這些視覺特徵使用主要網路生成全域性上下文，最後使用CAAG指導主要網路掌握全域性上下文資訊。

Image Captioning with Context-Aware Auxiliary Guidance論文筆記

Image Captioning with Context-Aware Auxiliary Guidance論文筆記

影象檢索（image retrieval）- 10 - Fine-tuning CNN Image Retrieval with No Human Annotation - 1 - 論文學習

Context-Fused Guidance for Image Captioning Using Sequence-Level Training論文筆記

《ContextNet：Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation》論文筆記

鑑別力感知的通道剪枝——Discrimination-aware Channel Pruning論文解讀

【論文筆記（5）ECCV2020】Graph convolutional networks for learning with few clean and many noisy labels

《Flexible Image Denoising with Multi-layer Conditional Feature Modulation》閱讀筆記

AlexNet論文(ImageNet Classification with Deep Convolutional Neural Networks)學習筆記

《A Joint Neural Model for Information Extraction with Global Features》論文筆記

論文筆記：Enhancing Pre-trained Chinese Character Representation with Word-aligned Attention

【論文筆記】Improving Transformer-based End-to-End Speech Recognition with CTC and LM Integration

(CV學習筆記)看圖說話(Image Captioning)-1

論文筆記_S2D.21_Deep Convolutional Neural Fields for Depth Estimation from a Single Image

DeText: A Deep Text Ranking Framework with BERT論文筆記

論文筆記2：Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

論文筆記3：SegFormer Simple and Efficient Design for Semantic Segmentation with Transformers

SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS 論文筆記

Context-aware Dynamics Model for Generalization in Model-Based Reinforcement Learning

論文筆記(9)-"Personalized Federated Learning with Gaussian Processes"

【論文筆記】SamWalker: Social Recommendation with Informative Sampling Strategy

Image Captioning with Context-Aware Auxiliary Guidance論文筆記

相關推薦