Rethinking Reading Order: Toward Generalizable Document Understanding with LLM-based Relation Modeling

Weishi Wang; Hengchang Hu; Daniel Dahlmeier

2026 EACL EACL 2026

Rethinking Reading Order: Toward Generalizable Document Understanding with LLM-based Relation Modeling

Abstract

AbstractDocument understanding requires modeling both structural and semantic relationships between the layout elements within the document, with human-perceived reading order (RO) playing a crucial yet often neglected role compared to heuristic OCR sequences used by most existing models. Previous approaches depend on costly, inconsistent human annotations, limiting scalability and generalization. To bridge the gap, we propose a cost-effective paradigm that leverages large language models (LLMs) to infer global RO and inter-element layout relations without human supervision. By explicitly incorporating RO as structural guidance, our method captures hierarchical, document-level dependencies beyond local adjacency. Experiments on Semantic Entity Recognition, Entity Linking, and Document Question Answering show consistent improvements over baseline methods. Notably, LLM-inferred RO, even when differing from ground-truth adjacency, provides richer global structural priors and yields superior downstream performance. These results and findings demonstrate the scalability and significance of RO-aware modeling, advancing both LLMs and lightweight layout-aware models for robust document understanding. Code, data, and more details will be made publicly available after corporate review, in accordance with SAP’s corporate open-source policy.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Weishi Wang , Hengchang Hu , Daniel Dahlmeier

Topics

Artificial Intelligence > Core AI > Foundation Models Artificial Intelligence > Core AI > Multimodal Learning Artificial Intelligence > Learning Paradigms > Transfer Learning

Keywords

document understanding layout analysis large language model reading order semantic entity recognition

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026