[深度學習]場景文字檢測與識別
目錄
Holistic, Multi-Channel Prediction
Corner Localization and Region Segmentation(A Megvii work in CVPR 2018)
EAST (A Megvii work in CVPR 2017)
TextSnake (A Megvii work in ECCV 2018)
Mask TextSpotter (A Megvii work in ECCV 2018)
背景
文字為什麼重要?
因為人類創造了文字,它具有兩種特點:
- 具有豐富和精確的高層語義資訊
- 傳達了人類的思想和感情
同時文字在自然場景中可以作為一種視覺線索,具有互補的作用,比如邊緣,紋理等等。
問題定義
文字檢測是指通過演算法判斷文字的位置以及檢測字元的過程。
那麼會有那些挑戰呢?
與傳統的OCR不同,
自然場景更雜亂,OCR 更規整
文字型別千變萬化,格式,顏色等
具體的挑戰分為三類:
- 不同的大小,語言,格式等
- 背景中的干擾,符號,交通訊號燈等結構具有區域性相似性
- 成像過程,噪聲模糊遮擋陰影等等。
近期前沿和有代表性演算法
有一些演算法從目標檢測和語義分割中得到靈感啟發:
Holistic, Multi-Channel Prediction
Yao et al.. Scene Text Detection via Holistic, Multi-Channel Prediction. 2016. arXiv preprint arXiv:1606.09002
- lholistic vs. local
- l
- lconceptionally and functionally different from previous sliding-window or connected component based approaches
- lholistic, pixel-wise predictions: text region map, character map and linking orientation map
- ldetections are formed using these three maps
- lcan simultaneously handle horizontal, multi-oriented and curved text in real-world natural images
TextBoxes
Liao et al.. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI, 2017.
- la text detection method inspired by SSD
- lboth high accuracy and efficiency
Rotation Proposals
Ma et al.. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. arxiv, 2017.
- la multi-oriented text detection method based on Faster RCNN
- lpropose several modifications to better detect scene text
Corner Localization and Region Segmentation
(A Megvii work in CVPR 2018)
Lyu et al.. Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. CVPR, 2018.
- la compound text detection method: corner localization and region segmentation
- lcorner localization: corner detection with SSD
- lregion segmentation: position-sensitive segmentation with R-FCN
Simpler Pipelines
EAST (A Megvii work in CVPR 2017)
Zhou et al.. EAST: An Efficient and Accurate Scene Text Detector. CVPR, 2017.
lmain idea: predict location, scale and orientation of text with a single model and multiple loss functions (multi-task training)
ladvantanges:
(a). accuracy: allow for end-to-end training and optimization
(b). efficiency: remove redundant stages and processings
任意形狀的文字檢測
TextSnake (A Megvii work in ECCV 2018)
Long et al.. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes, ECCV, 2018.
- la novel and flexible representation
- lable to effectively and precisely describe the geometric properties, such as location, scale, and bending of curved text, while the other representations (axis-aligned rectangle, rotated rectangle or quadrangle) struggle
la text instance is described as a sequence of ordered, overlapping disks centered at symmetric axes, each of which is associated with potentially variable radius and orientation
Mask TextSpotter (A Megvii work in ECCV 2018)
Lyu et al.. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes, ECCV, 2018.
- lan end-to-end system for both text detection and recognition
- linspired by Mask R-CNN
- lRPN for text proposal generation
- lFast R-CNN for proposal classification and regression
- lmask branch for character segmentaion and recognition
文字識別
CRNN
Shi et al.. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition, TPAMI, 2017.
ASTER
Shi et al.. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification, TPAMI, 2018.
FAN
資源推薦
•Survey
•Scene Text Detection and Recognition: The Deep Learning Era
•arXiv: https://arxiv.org/abs/1811.04256 (draft version)
•Github: https://github.com/Jyouhou/SceneTextPapers (compiled papers, datasets & codes)
•Laboratories and Papers
•https://github.com/chongyangtao/Awesome-Scene-Text-Recognition
•Datasets and Codes
•https://github.com/seungwooYoo/Curated-scene-text-recognition-analysis
•Projects and Products
•https://github.com/wanghaisheng/awesome-ocr