Modular Self-Supervision for Document-Level Relation Extraction

Sheng Zhang; Cliff Wong; Naoto Usuyama; Sarthak Jain; Tristan Naumann; Hoifung Poon

2021 EMNLP EMNLP 2021

Modular Self-Supervision for Document-Level Relation Extraction

Abstract

AbstractExtracting relations across large text spans has been relatively underexplored in NLP, but it is particularly important for high-value domains such as biomedicine, where obtaining high recall of the latest findings is crucial for practical applications. Compared to conventional information extraction confined to short text spans, document-level relation extraction faces additional challenges in both inference and learning. Given longer text spans, state-of-the-art neural architectures are less effective and task-specific self-supervision such as distant supervision becomes very noisy. In this paper, we propose decomposing document-level relation extraction into relation detection and argument resolution, taking inspiration from Davidsonian semantics. This enables us to incorporate explicit discourse modeling and leverage modular self-supervision for each sub-problem, which is less noise-prone and can be further refined end-to-end via variational EM. We conduct a thorough evaluation in biomedical machine reading for precision oncology, where cross-paragraph relation mentions are prevalent. Our method outperforms prior state of the art, such as multi-scale learning and graph neural networks, by over 20 absolute F1 points. The gain is particularly pronounced among the most challenging relation instances whose arguments never co-occur in a paragraph.

🌉 Interdisciplinary Bridge — Healthcare & Medicine and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — argument resolution

🐣 Hot Topic Early Bird — document-level relation extraction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sheng Zhang , Cliff Wong , Naoto Usuyama , Sarthak Jain , Tristan Naumann , Hoifung Poon

Topics

Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Applications > Information Extraction Healthcare & Medicine > Clinical > Clinical NLP Healthcare & Medicine > Research > Bioinformatics Machine Learning > Bayesian & Probabilistic > Variational Inference Natural Language Processing > Applications > Relation Extraction

Keywords

variational inference self-supervised learning relation extraction document-level extraction precision oncology document-level relation extraction relation detection biomedical nlp graph neural network argument resolution biomedical machine reading

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021