【NeurIPS】ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

阿新 • • 發佈：2022-03-28

論文：https://openreview.net/forum?id=_WnAQKse_uK

程式碼：https://github.com/Annbless/ViTAE

1、Motivation

這個論文的思想非常簡單：將CNN和 VIT 結合，淺層用CNN，深層用VIT。同時，在attention 分支新增一個卷積層分支。

2、Method

網路整體架構如下圖所示，包括三個 Reduction Cell （RC）和若干 Normal Cell（NC）。

RC 模組

和 VIT 的 Transformer block 相比，RC多了一個 pyramid reduction ，就是多尺度空洞卷積並行，最終拼接成一個。同時，在 shortcut 裡，多了3個卷積。最後，還要 seq2img 轉成 feature map。

NC 模組

和VIT的 transformer block 有區別的地方就是計算 attention 那裡多了一個卷積分支。

3、有趣的地方

從openreview的意見來看，審稿人認可的 strong points:

The idea of injecting multi-scale features is interesting and promising.
The paper is well written and easy to follow.

同時，論文也存在一些薄弱環節：

The paper use an additional conv branch together with the self-attention branch to construct the new network architecture, it is obvious that the extra conv layers will help to improve the performance of the network. The proposed network modification looks a little bit incremental and not very interesting to me.

There are no results on the downstream object detection and segmentation tasks, since this paper aims to introduce the inductive bias on the visual structure.
The proposed method is mainly verified on small input images. Thus, I am a little bit concerned about its memory consumption and running speed when applied on large images (as segmentation or detection typically uses large image resolutions).

【NeurIPS】ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

論文：https://openreview.net/forum?id=_WnAQKse_uK 程式碼：https://github.com/Annbless/ViTAE 1、Motivation

【Leetcode】974. Subarray Sums Divisible by K

技術標籤：# 棧、佇列、串及其他資料結構雜湊表hashmap雜湊leetcode 題目地址：

【LeetCode】1018. Binary Prefix Divisible By 5 可被 5 整除的二進位制字首（Easy）（JAVA）每日一題

技術標籤：LeetCode 每日一題javaleetcode面試資料結構演算法【LeetCode】1018. Binary Prefix Divisible By 5 可被 5 整除的二進位制字首（Easy）（JAVA）

【leetcode】1015. Smallest Integer Divisible by K

Given a positive integer k, you need to find thelengthof thesmallestpositive integernsuch thatnis divisible byk, andnonly contains the digit1.Returnthelengthofn. If there is no suchn, return -1.

【ICLR2022】Expediting vision transformers via token reorganization

【ICLR2022】Not all patches are what you need: Expediting vision transformers via token reorganization

【CVPR2022】Restormer: Efficient Transformer for High-Resolution Image Restoration

a 論文連結：https://arxiv.org/abs/2111.09881 程式碼連結：https://github.com/swz30/Restormer 1、研究動機

【NeurIPS2022】Cross Aggregation Transformer for Image Restoration

【NeurIPS2022】Cross Aggregation Transformer for Image Restoration 研究動機：當前方法 Transformer 方法把影象分成8x8的小塊處理，the square window lacks inter-window interaction, leading to the slow inc

【AAAI2022】ShiftVIT: When Shift Operation Meets Vision Transformer

論文：【AAAI2022】When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

【MySQL】深入理解ORDER BY的排序規則及多個欄位排序的實現

引言 MySQL的ORDER BY語句在開發中經常用到，但你可曾想過它底層的排序規則，以及在面臨多個欄位排序的時候該如何抉擇麼？本文將帶你學習這些。

【轉】Go by Example: Worker Pools

原文：https://gobyexample.com/worker-pools Our running program shows the 5 jobs being executed by various workers. The program only takes about 2 seconds despite doing about 5 seconds of total work be

【數論】整理關於ax+by=c

整理關於 \$\\rm{ax+by=c}\$，遇到的一系列在這裡，\$x\$ ，\$y\$ 是變數，\$a,\\;b,\\;c\$ 是常量

【題解】[TJOI2013] 黃金礦工 By 5ab as a juruo.

[toc] 題目資訊題目來源：CCF 天津省選 2013；線上評測地址：Luogu#3961；執行限制：時間不超過 $1.00\\ \\textrm$，空間不超過 $128\\ \\textrm$。

SQL去重語句【distinct】和【group by】究竟用哪個？

技術標籤：能工巧匠mysqlsql資料庫我是幫助主人快速定位的目錄錄~ 問題丟擲distinct和group by的用法distinctgroup by

【Leetcode】922. Sort Array By Parity II

技術標籤：# 陣列、連結串列與模擬leetcode演算法java 題目地址： https://leetcode.com/problems/sort-array-by-parity-ii/

【mysql】執行group by提示only_full_group_by問題解決方法

技術標籤：LinuxMySQLmysqllinux資料庫先檢視當前的sql_mode情況 select @@sql_mode 解決辦法

【leetcode】1636. Sort Array by Increasing Frequency

題目如下： Given an array of integersnums, sort the array inincreasingorder based on the frequency of the values. If multiple values have the same frequency, sort them indecreasingorder.

【Git】pull遇到錯誤：error: Your local changes to the following files would be overwritten by merge:

這種情況下，如何保留本地的修改同時又把遠端的合併過來呢？首先取決於你是否想要儲存本地修改。（是 /否）

【MySQL】筆記（2）--- 部分 DQL 語句；條件查詢；排序；分組函式；單行處理函式；group by ，having ；

...分組函式一般都會和group by聯合使用，這也是為什麼它被稱為分組函式的原因，並且任何一個分組函式（count sum avg max min）都是在group by語句執行結束之後才會執行的，當一條sql語句沒有group by的話，整張表

【大資料面試】【框架】Hive：架構、計算引擎、比較、內外部表、by、函式、優化、資料傾斜、動靜態分割槽

一、組成 1、架構源資料原本是存在dubby資料庫，存在MySQL可以支援多個客戶端

【ICCV2021】Tokens-to-Token ViT: Training Vision Transformers From Scratch on ImageNet

部分內容來自於 GiantPandaCV 的文章論文：https://openaccess.thecvf.com/content/ICCV2021/papers/Yuan_Tokens-to-Token_ViT_Training_Vision_Transformers_From_Scratch_on_ImageNet_ICCV_2021_paper.pdf

【NeurIPS】ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

1、Motivation

2、Method

RC 模組

NC 模組

3、有趣的地方

相關推薦