SCATTER: Selective Context Attentional Scene Text Recognizer

Ron Litman; Oron Anschel; Shahar Tsiper; Roee Litman; Shai Mazor; R. Manmatha

2020 CVPR CVPR 2020

SCATTER: Selective Context Attentional Scene Text Recognizer

Abstract

Scene Text Recognition (STR), the task of recognizing text against complex image backgrounds, is an active area of research. Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes. In this paper, we introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER). SCATTER utilizes a stacked block architecture with intermediate supervision during training, that paves the way to successfully train a deep BiLSTM encoder, thus improving the encoding of contextual dependencies. Decoding is done using a two-step 1D attention mechanism. The first attention step re-weights visual features from a CNN backbone together with contextual features computed by a BiLSTM layer. The second attention step, similar to previous papers, treats the features as a sequence and attends to the intra-sequence relationships. Experiments show that the proposed approach surpasses SOTA performance on irregular text recognition benchmarks by 3.7% on average.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — 2d attention

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ron Litman , Oron Anschel , Shahar Tsiper , Roee Litman , Shai Mazor , R. Manmatha

Topics

Deep Learning > Architectures > Neural Networks Computer Vision > Analysis > Scene Understanding Artificial Intelligence > Core AI > Computer Vision Computer Vision > Processing > Image Processing Deep Learning > Techniques > Attention Natural Language Processing > Applications > Text Recognition

Keywords

attention mechanism selective attention convolutional neural network bidirectional lstm scene text recognition contextual dependency bi-directional lstm 2d attention irregular text recognition

Download PDF

Related papers

Deep Polarization Cues for Transparent Object Segmentation 2020

HRank: Filter Pruning Using High-Rank Feature Map 2020

Panoptic-Based Image Synthesis 2020

Select, Supplement and Focus for RGB-D Saliency Detection 2020

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings 2020