Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

Xinyan Zu; Haiyang Yu; Bin Li; Xiangyang Xue

2023 IJCAI IJCAI 2023

Towards Accurate Video Text Spotting with Text-wise Semantic Reasoning

Abstract

Video text spotting (VTS) aims at extracting texts from videos, where text detection, tracking and recognition are conducted simultaneously. There have been some works that can tackle VTS; however, they may ignore the underlying semantic relationships among texts within a frame. We observe that the texts within a frame usually share similar semantics, which suggests that, if one text is predicted incorrectly by a text recognizer, it still has a chance to be corrected via semantic reasoning. In this paper, we propose an accurate video text spotter, VLSpotter, that reads texts visually, linguistically, and semantically. For ‘visually’, we propose a plug-and-play text-focused super-resolution module to alleviate motion blur and enhance video quality. For ‘linguistically’, a language model is employed to capture intra-text context to mitigate wrongly spelled text predictions. For ‘semantically’, we propose a text-wise semantic reasoning module to model inter-text semantic relationships and reason for better results. The experimental results on multiple VTS benchmarks demonstrate that the proposed VLSpotter outperforms the existing state-of-the-art methods in end-to-end video text spotting.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

📈 Trend Setter — Video Understanding

🧭 Keyword Pioneer — semantic reasoning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Robotics, Speech & Audio

🐣 Hot Topic Early Bird — text detection

Authors

Xinyan Zu , Haiyang Yu , Bin Li , Xiangyang Xue

Topics

Artificial Intelligence > Core AI > Multimodal Learning Computer Vision > Analysis > Scene Understanding Computer Vision > Processing > Video Understanding

Keywords

multimodal learning semantic reasoning video text spotting text detection text tracking

Download PDF

Related papers

Analyzing Intentional Behavior in Autonomous Agents under Uncertainty 2023

Deep Hashing-based Dynamic Stock Correlation Estimation via Normalizing Flow 2023

U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation 2023

Artificial Agents Inspired by Human Motivation Psychology for Teamwork in Hazardous Environments 2023

Proportionally Fair Online Allocation of Public Goods with Predictions 2023