1. 程式人生 > 其它 >On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention

On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention



1Shallow CNN,用於控制計算量

2Adaptive 2D positional encoding

論文中說Transformer的Position Encoding模組可能在視覺作用中起不了作用,但是位置資訊又很重要,尤其是論文致力於解決任意形狀的文字識別問題,作者對位置編碼進行了可學習的自適應,目的是


We visualize random input images from three groups with different predicted aspect ratios, as a by-product of A2DPE. Figure 7 shows the examples according to the ratios α/β. Low aspect ratio group, as expected, contains mostly horizontal samples, and high aspect ratio group contains mostly vertical samples. By dynamically adjusting the grid spacing, A2DPE reduces the representation burden for the other modules, leading to performance boost.

3Locality-aware feedforward layer

For good STR performance, a model should not only utilize long-range dependencies but also local vicinity around single characters.


