Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Phong Le; Ivan Titov

2019 ACL ACL 2019

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Abstract

AbstractModern entity linking systems rely on large collections of documents specifically annotated for the task (e.g., AIDA CoNLL). In contrast, we propose an approach which exploits only naturally occurring information: unlabeled documents and Wikipedia. Our approach consists of two stages. First, we construct a high recall list of candidate entities for each mention in an unlabeled document. Second, we use the candidate lists as weak supervision to constrain our document-level entity linking model. The model treats entities as latent variables and, when estimated on a collection of unlabelled texts, learns to choose entities relying both on local context of each mention and on coherence with other entities in the document. The resulting approach rivals fully-supervised state-of-the-art systems on standard test sets. It also approaches their performance in the very challenging setting: when tested on a test set sampled from the data used to estimate the supervised systems. By comparing to Wikipedia-only training of our model, we demonstrate that modeling unlabeled documents is beneficial.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

📈 Trend Setter — Entity Linking

🧭 Keyword Pioneer — document-level model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Phong Le , Ivan Titov

Topics

Machine Learning > Learning Types > Unsupervised Learning Machine Learning > Learning Types > Weakly Supervised Learning Natural Language Processing > Applications > Entity Linking

Keywords

natural language processing weakly supervised learning entity linking weak supervision latent variable model document-level model entity coherence document-level entity linking

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019