Out of Length Text Recognition with Sub-String Matching

Yongkun Du; Zhineng Chen; Caiyan Jia; Xieping Gao; Yu-Gang Jiang

2025 AAAI AAAI 2025

Out of Length Text Recognition with Sub-String Matching

Abstract

Abstract Scene Text Recognition (STR) methods have demonstrated robust performance in word-level text recognition. However, in real applications the text image is sometimes long due to detected with multiple horizontal words. It triggers the requirement to build long text recognition models from readily available short (i.e., word-level) text datasets, which has been less studied previously. In this paper, we term this task Out of Length (OOL) text recognition. We establish the first Long Text Benchmark (LTB) to facilitate the assessment of different methods in long text recognition. Meanwhile, we propose a novel method called OOL Text Recognition with sub-String Matching (SMTR). SMTR comprises two cross-attention-based modules: one encodes a sub-string containing multiple characters into next and previous queries, and the other employs the queries to attend to the image features, matching the sub-string and simultaneously recognizing its next and previous character. SMTR can recognize text of arbitrary length by iterating the process above. To avoid being trapped in recognizing highly similar sub-strings, we introduce a regularization training to compel SMTR to effectively discover subtle differences between similar sub-strings for precise matching. In addition, we propose an inference augmentation strategy to alleviate confusion caused by identical sub-strings in the same text and improve the overall recognition efficiency. Extensive experimental results reveal that SMTR, even when trained exclusively on short text, outperforms existing methods in public short text benchmarks and exhibits a clear advantage on LTB.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — sub-string matching

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Robotics, Speech & Audio

Authors

Yongkun Du , Zhineng Chen , Caiyan Jia , Xieping Gao , Yu-Gang Jiang

Topics

Machine Learning > Application Areas > Data Augmentation Computer Vision > Analysis > Scene Understanding Natural Language Processing > Applications > Text Classification Artificial Intelligence > Core AI > Computer Vision Deep Learning > Techniques > Attention Computer Vision > Applications > Document Analysis

Keywords

cross-attention module scene text recognition sub-string matching out of length text inference augmentation long text benchmark long text recognition regularization training

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025