2024 NAACL NAACL 2024

DuRE: Dual Contrastive Self Training for Semi-Supervised Relation Extraction

Abstract

AbstractDocument-level Relation Extraction (RE) aims to extract relation triples from documents. Existing document-RE models typically rely on supervised learning which requires substantial labeled data. To alleviate the amount of human supervision, Self-training (ST) has prospered again in language understanding by augmenting the fine-tuning of big pre-trained models whenever labeled data is insufficient. However, existing ST methods in RE fail to tackle the challenge of long-tail relations. In this work, we propose DuRE, a novel ST framework to tackle these problems. DuRE jointly models RE classification and text generation as a dual process. In this way, our model could construct and utilize both pseudo text generated from given labels and pseudo labels predicted from available unlabeled text, which are gradually refined during the ST phase. We proposed a contrastive loss to leverage the signal of the RE classifier to improve generation quality. In addition, we propose a self-adaptive way to sample pseudo text from different relation classes. Experiments on two document-level RE tasks show that DuRE significantly boosts recall and F1 score with comparable precision, especially for long-tail relations against several strong baselines.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio