DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

Hao Wang; Qingxuan Wang; Yue Li; Changqing Wang; Chenhui Chu; Rui Wang

2023 EMNLP EMNLP 2023

DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading

Abstract

AbstractThe use of visually-rich documents in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce DocTrack, a visually-rich document dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progresses, they still have a long way to go before they can read visually richer documents as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of document intelligence.

🌉 Interdisciplinary Bridge — Computer Science and Computer Vision and Interdisciplinary

🧭 Keyword Pioneer — human reading order

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hao Wang , Qingxuan Wang , Yue Li , Changqing Wang , Chenhui Chu , Rui Wang

Topics

Computer Science > Applications > Document Analysis Interdisciplinary > Cognitive Science > Perception Interdisciplinary > Social > Education Computer Vision > Domain-Specific > Document Analysis

Keywords

document understanding eye tracking visual document reading order visually-rich document human reading order document ai eye-movement information document intelligence

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023