Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Siwen Luo; Yihao Ding; Siqu Long; Josiah Poon; Soyeon Caren Han

2022 COLING COLING 2022

Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis

Abstract

AbstractRecognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on visual cues to understand documents while ignoring other information, such as contextual information or the relationships between document layout components, which are vital to boost better layout analysis performance. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We construct different graphs to capture the four main features aspects of document layout components, including syntactic, semantic, density, and appearance features. Then, we apply graph convolutional networks to enhance each aspect of features and apply the node-level pooling for integration. Finally, we concatenate features of all aspects and feed them into the 2-layer MLPs for document layout component classification. Our Doc-GCN achieves state-of-the-art results on three widely used DLA datasets: PubLayNet, FUNSD, and DocBank. The code will be released at https://github.com/adlnlp/doc_gcn

🌉 Interdisciplinary Bridge — Computer Science and Computer Vision and Deep Learning

🧭 Keyword Pioneer — layout classification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Siwen Luo , Yihao Ding , Siqu Long , Josiah Poon , Soyeon Caren Han

Topics

Deep Learning > Architectures > Graph Neural Networks Computer Vision > Analysis > Semantic Segmentation Computer Vision > Processing > Image Segmentation Computer Science > Applications > Document Analysis Computer Vision > Domain-Specific > Document Analysis

Keywords

document understanding document parsing document layout analysis document classification node classification visual feature heterogeneous graph graph convolutional network semantic feature layout classification

Download PDF

Related papers

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation 2022

The Role of Context and Uncertainty in Shallow Discourse Parsing 2022

SelfMix: Robust Learning against Textual Label Noise with Self-Mixup Training 2022

Complicate Then Simplify: A Novel Way to Explore Pre-trained Models for Text Classification 2022

Repo4QA: Answering Coding Questions via Dense Retrieval on GitHub Repositories 2022