2024 WACV WACV 2024

Harnessing the Power of Multi-Lingual Datasets for Pre-Training: Towards Enhancing Text Spotting Performance

Abstract

The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene image datasets, which do not directly exploit the feature interaction between multiple domains. In this work, we investigate the problem of domain-adapted scene text spotting, i.e., training a model on multi-domain source data such that it can directly adapt to target domains rather than being specialized for a specific domain or scenario. Further, we investigate a transformer baseline called Swin-TESTR to focus on solving scene-text spotting for both regular (ICDAR2015) and arbitrary-shaped scene text (CTW1500, TotalText) along with an exhaustive evaluation. The results clearly demonstrate the potential of intermediate representations on text spotting benchmarks across multiple domains (e.g. language, synth to real, and documents) both in terms of accuracy and model efficiency.

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio