Towards a new research agenda for multimodal enterprise document understanding: What are we missing?

Armineh Nourbakhsh; Sameena Shah; Carolyn Rose

2024 ACL ACL 2024

Towards a new research agenda for multimodal enterprise document understanding: What are we missing?

Abstract

AbstractThe field of multimodal document understanding has produced a suite of models that have achieved stellar performance across several tasks, even coming close to human performance on certain benchmarks. Nevertheless, the application of these models to real-world enterprise datasets remains constrained by a number of limitations. In this position paper, we discuss these limitations in the context of three key aspects of research: dataset curation, model development, and evaluation on downstream tasks. By analyzing 14 datasets and 7 SotA models, we identify major gaps in their utility in the context of a real-world scenario. We demonstrate how each limitation impedes the widespread use of SotA models in enterprise settings, and present a set of research challenges that are motivated by these limitations. Lastly, we propose a research agenda that is aimed at driving the field towards higher impact in enterprise applications.

❓ The Questioner

🌉 Interdisciplinary Bridge — Computer Vision and Machine Learning

🧭 Keyword Pioneer — multimodal document understanding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Natural Language Processing, Security & Privacy, Speech & Audio

Authors

Armineh Nourbakhsh , Sameena Shah , Carolyn Rose

Topics

Machine Learning > Application Areas > Domain Adaptation Computer Vision > Processing > Video Understanding

Keywords

dataset curation model development enterprise application multimodal document understanding real-world scenario

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024