2026 WACV WACV 2026

UniTabBank: A Large Scale Multi-Lingual, Multi-Layout, Multi-Type, Multi-Format Dataset for Table Detection

Abstract

Tables play a key role in conveying structured data across documents. Accurate table detection is crucial for downstream tasks like structure recognition and information extraction. However, current datasets lack diversity in format, language, and layout, limiting real-world generalization. This underscores the need for well-annotated datasets that are multi-lingual, layout-diverse, document-agnostic and format-rich. To address these limitations, we introduce UniTabBank, a large scale, diverse table detection dataset designed to reflect realistic use cases. UniTabBank is characterized by five key attributes: (i) Multi-Lingual --- supporting 28 languages (including Arabic, English, Hindi, etc.); (ii) Multi-Layout --- encompassing both single-column and multi-column documents; (iii) Multi-Type --- covering a wide range of document genres such as annual reports, books, newspapers, and magazines; (iv) Multi-Format --- comprising scanned documents, photographed pages, and PDFs; and finally (v) Scale and Annotation Quality --- consists of 55,443 document page images with 81,179 accurately annotated table instances, offering scale and annotation precision. Additionally, we introduce UniTabDet, a YOLO-based model for table detection, which outperforms state-of-the-arts on eight out of nine table detection benchmarks. Evaluated in a zero-shot setting on four layout analysis benchmarks, UniTabDet also shows strong generalization across diverse documents without additional fine-tuning. The UniTabBank dataset and models are avaialble here.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio