emrQA: A Large Corpus for Question Answering on Electronic Medical Records

Anusri Pampari; Preethi Raghavan; Jennifer Liang; Jian Peng

2018 EMNLP EMNLP 2018

emrQA: A Large Corpus for Question Answering on Electronic Medical Records

Abstract

AbstractWe propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million questions-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping.

🌉 Interdisciplinary Bridge — Healthcare & Medicine and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — nlp annotation

🐣 Hot Topic Early Bird — dataset creation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Anusri Pampari , Preethi Raghavan , Jennifer Liang , Jian Peng

Topics

Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Question Answering Healthcare & Medicine > Clinical > Clinical NLP

Keywords

domain adaptation dataset creation question answering clinical note dataset generation logical form clinical nlp nlp annotation electronic medical record

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018