StructuralLM: Structural Pre-training for Form Understanding

Chenliang Li; Bin Bi; Ming Yan; Wei Wang; Songfang Huang; Fei Huang; Luo Si

2021 ACL ACL 2021

StructuralLM: Structural Pre-training for Form Understanding

Abstract

AbstractLarge pre-trained language models achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, they almost exclusively focus on text-only representation, while neglecting cell-level layout information that is important for form image understanding. In this paper, we propose a new pre-training approach, StructuralLM, to jointly leverage cell and layout information from scanned documents. Specifically, we pre-train StructuralLM with two new designs to make the most of the interactions of cell and layout information: 1) each cell as a semantic unit; 2) classification of cell positions. The pre-trained StructuralLM achieves new state-of-the-art results in different types of downstream tasks, including form understanding (from 78.95 to 85.14), document visual question answering (from 72.59 to 83.94) and document image classification (from 94.43 to 96.08).

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — layout information

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chenliang Li , Bin Bi , Ming Yan , Wei Wang , Songfang Huang , Fei Huang , Luo Si

Topics

Machine Learning > Application Areas > Domain Adaptation Deep Learning > Techniques > Pretraining Natural Language Processing > Applications > Document Analysis Computer Vision > Applications > Document Analysis

Keywords

visual question answering pre-trained language model document image layout information structural pre-training form understanding

Download PDF

Related papers

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training 2021

A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification 2021

How Did This Get Funded?! Automatically Identifying Quirky Scientific Achievements 2021

Exploring Discourse Structures for Argument Impact Classification 2021

Language Embeddings for Typology and Cross-lingual Transfer Learning 2021