Enhanced Table Structure Recognition with Multi-Modal Approach

Huichen Yang; Andrew D. Hellicar; Maciej Rybinski; Sarvnaz Karimi

2025 IJCNLP IJCNLP 2025

Enhanced Table Structure Recognition with Multi-Modal Approach

Abstract

AbstractTables are fundamental for presenting information in research articles, technical documents, manuals, and reports. One key challenge is accessing the information in tables that are embedded in Portable Document Format (PDF) files or scanned images. It requires accurately recognising table structures in diverse table layouts and complex tables. The Table Structure Recognition (TSR) task aims to recognise the internal structure of table images and convert them into a machine-readable format. We propose a flexible multi-modal framework for image-based TSR. Our approach employs two-stream transformer encoders alongside task-specific decoders for table structure extraction and cell bounding box detection. Experiments on benchmark datasets demonstrate that our model achieves highly competitive results compared to strong baselines, gaining 5.4% over single-modality approaches on the FinTabNetd dataset.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio