Equipping Educational Applications with Domain Knowledge

Tarek Sakakini; Hongyu Gong; Jong Yoon Lee; Robert Schloss; Jinjun Xiong; Suma Bhat

2019 ACL ACL 2019

Equipping Educational Applications with Domain Knowledge

Abstract

AbstractOne of the challenges of building natural language processing (NLP) applications for education is finding a large domain-specific corpus for the subject of interest (e.g., history or science). To address this challenge, we propose a tool, Dexter, that extracts a subject-specific corpus from a heterogeneous corpus, such as Wikipedia, by relying on a small seed corpus and distributed document representations. We empirically show the impact of the generated corpus on language modeling, estimating word embeddings, and consequently, distractor generation, resulting in better performances than while using a general domain corpus, a heuristically constructed domain-specific corpus, and a corpus generated by a popular system: BootCaT.

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — corpus extraction

🐣 Hot Topic Early Bird — domain knowledge

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Tarek Sakakini , Hongyu Gong , Jong Yoon Lee , Robert Schloss , Jinjun Xiong , Suma Bhat

Topics

Machine Learning > Core Methods > Representation Learning Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Resources & Methods > Text Representation Interdisciplinary > Education

Keywords

language modeling document representation domain knowledge word embedding distractor generation corpus extraction

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019