Read Extensively, Focus Smartly: A Cross-document Semantic Enhancement Method for Visual Documents NER

Jun Zhao; Xin Zhao; WenYu Zhan; Tao Gui; Qi Zhang; Liang Qiao; Zhanzhan Cheng; Shiliang Pu

2022 COLING COLING 2022

Read Extensively, Focus Smartly: A Cross-document Semantic Enhancement Method for Visual Documents NER

Abstract

AbstractThe introduction of multimodal information and pretraining technique significantly improves entity recognition from visually-rich documents. However, most of the existing methods pay unnecessary attention to irrelevant regions of the current document while ignoring the potentially valuable information in related documents. To deal with this problem, this work proposes a cross-document semantic enhancement method, which consists of two modules: 1) To prevent distractions from irrelevant regions in the current document, we design a learnable attention mask mechanism, which is used to adaptively filter redundant information in the current document. 2) To further enrich the entity-related context, we propose a cross-document information awareness technique, which enables the model to collect more evidence across documents to assist in prediction. The experimental results on two documents understanding benchmarks covering eight languages demonstrate that our method outperforms the SOTA methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Computer Vision and Deep Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jun Zhao , Xin Zhao , WenYu Zhan , Tao Gui , Qi Zhang , Liang Qiao , Zhanzhan Cheng , Shiliang Pu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Architectures > Transformers Computer Science > Applications > Document Analysis Natural Language Processing > Applications > Named Entity Recognition Computer Vision > Domain-Specific > Document Analysis

Keywords

named entity recognition multimodal learning document understanding sparse attention cross-document attention visual document

Download PDF

Related papers

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation 2022

The Role of Context and Uncertainty in Shallow Discourse Parsing 2022

SelfMix: Robust Learning against Textual Label Noise with Self-Mixup Training 2022

Complicate Then Simplify: A Novel Way to Explore Pre-trained Models for Text Classification 2022

Repo4QA: Answering Coding Questions via Dense Retrieval on GitHub Repositories 2022