ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

阿新 • • 發佈：2022-03-16

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

2022-03-16 21:02:21

Paper: http://proceedings.mlr.press/v139/jia21b/jia21b.pdf

1. Background and Motivation:

隨著深度學習逐步進入深水區，基於多模態大模型的預訓練技術開始逐漸吸引眾多研究者的關注。本文提到現有方法所得到的大型資料集，規模還不足，因此嘗試利用 CC3M 資料集的收集方式，得到海量的帶有噪聲的 image-text pair 資料。但是不像 CC3M 那樣採用嚴格的篩選方式得到較為乾淨的資料，作者僅採用簡單的過濾方式，得到了比 CC3M 大兩個數量級的資料集。作者的實驗表明，在這種帶有嚴重噪聲的資料上得到的模型，也可以在眾多工上取得不錯的效果。

為了訓練該模型，作者利用一個目標函式在一個共享的隱層對映空間來對齊視覺和語言表示，使用的是一個簡單地 dual-encoder 結構。類似的目標可以用於學習視覺-語義對映(visual-semantic embedding, VSE)。作者將其所得到的模型，定義為 ALIGN：A Large-scale ImaGe and Noisy-text embedding。影象和文字編碼器是通過一個對比損失來建模的，通過拉近匹配樣本的距離，而推遠非匹配樣本的距離。這也是自監督和監督表示學習常用的損失函式。這種對齊的影象和文字表示可以自然的適合跨模態匹配/檢索任務，並在對應的資料集上均得到了領先的精度。

2. A large-scale noisy image-text dataset:

ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

12-in-1: Multi-Task Vision and Language Representation Learning

論文解讀（DiffPool）《Hierarchical Graph Representation Learning with Differentiable Pooling》

【ICLR2021】Robust and generalizable visual representation learning via random convolutions

牛客網Groundhog and 2-Power Representation

A Groundhog and 2-Power Representation 大數操作

牛客多校(2020第九場)A Groundhog and 2-Power Representation

無監督學習 MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

論文解讀《Momentum Contrast for Unsupervised Visual Representation Learning》俗稱 MoCo

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

論文筆記1：Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Plug and Play Language Models

2 errors and 0 warnings potentially fixable with the `--fix` option，vue-cli3中eslint詳解

【論文筆記（5）ECCV2020】Graph convolutional networks for learning with few clean and many noisy labels

Game Theory and Multi-agent Reinforcement Learning筆記上

Risk-Aware and Multi-Objective Decision Making with Distributional Monte Carlo Tree Search

Scalable Rule-Based Representation Learning for Interpretable Classification

ICLR2021 | The Intrinsic Dimension of Images and Its Impact on Learning

Representation Learning | 表徵學習

論文解讀（SUGRL）《Simple Unsupervised Graph Representation Learning》

ALIGN: Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

相關推薦