2026 WACV WACV 2026

Zero-Shot Table Extraction in Business Documents: A Unified Benchmark with Error Taxonomy and Ecological Analysis

Abstract

Tables in business documents power analytics and compliance, yet task-specific datasets are costly to build. Practitioners therefore turn to zero-shot vision-language models (VLMs). We study zero-shot realism for table detection (TD) and table structure recognition (TSR) under a unified protocol on DocILE-QUEST and a private STM154 corpus. We report TD with GIoU, Purity, and Completeness, and TSR with TEDS and TEDS-S, evaluating commercial VLMs (GPT-4o, GPT-5-mini), compact detectors, and supervised YOLO/DETR baselines. Zero-shot VLMs are strong for TSR and competitive for TD, while fine-tuned or from-scratch detectors lead when box quality and robustness to clutter matter. We add an automated error taxonomy that isolates actionable failures (missed, merged/split tables, header-body confusions, cell topology). Finally, we quantify emissions, finding a 10^4 gap between the lightest and heaviest systems.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio