MIST: Multilingual Incidental Dataset for Scene Text Detection

Saumya Mundra; Ajoy Mondal; C.V. Jawahar

2026 WACV WACV 2026

MIST: Multilingual Incidental Dataset for Scene Text Detection

Abstract

Scene text detection has progressed rapidly, largely driven by curated datasets and benchmarks. However, many of these have reached evaluation saturation and are heavily biased toward focused scenes, limiting their effectiveness in real-world environments where detection is hindered by environmental factors. To address this, we introduce MIST - a Multilingual Incidental Scene Text dataset featuring diverse text instances across 11 languages. MIST provides language, legibility, and fine-grained polygon-shaped annotations across 12K scene images and 600K word-level text instances. Images are captured along roads using a GoPro mounted on a moving car to capture real-world complexities, ensuring the scenes are incidental rather than deliberately framed. MIST establishes a new challenging benchmark to enable robust evaluation of scene text detection methods in real-world scenarios. The datasets and code will be available at https://saumya-svm.github.io/mist.

🧭 Keyword Pioneer — incidental capture

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio