Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

Maya Varma; Laurel Orr; Sen Wu; Megan Leszczynski; Xiao Ling; Christopher Re

2021 EMNLP EMNLP 2021

Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

Abstract

AbstractNamed entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grained structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In this work, we address these issues by proposing a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain. We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining. Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Furthermore, we improve disambiguation of rare entities by up to 57 accuracy points.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Healthcare & Medicine and Natural Language Processing

🧭 Keyword Pioneer — cross-domain data integration

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Maya Varma , Laurel Orr , Sen Wu , Megan Leszczynski , Xiao Ling , Christopher Re

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Natural Language Processing > Understanding > Named Entity Recognition Natural Language Processing > Applications > Named Entity Recognition Healthcare & Medicine > Clinical > Medical NLP

Keywords

knowledge transfer entity linking knowledge base named entity disambiguation biomedical text cross-domain data integration rare entity

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021