Learning with Limited Data for Multilingual Reading Comprehension

Kyungjae Lee; Sunghyun Park; Hojae Han; Jinyoung Yeo; Seung-won Hwang; Juho Lee

2019 IJCNLP IJCNLP 2019

Learning with Limited Data for Multilingual Reading Comprehension

Abstract

AbstractThis paper studies the problem of supporting question answering in a new language with limited training resources. As an extreme scenario, when no such resource exists, one can (1) transfer labels from another language, and (2) generate labels from unlabeled data, using translator and automatic labeling function respectively. However, these approaches inevitably introduce noises to the training data, due to translation or generation errors, which require a judicious use of data with varying confidence. To address this challenge, we propose a weakly-supervised framework that quantifies such noises from automatically generated labels, to deemphasize or fix noisy data in training. On reading comprehension task, we demonstrate the effectiveness of our model on low-resource languages with varying similarity to English, namely, Korean and French.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Kyungjae Lee , Sunghyun Park , Hojae Han , Jinyoung Yeo , Seung-won Hwang , Juho Lee

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Natural Language Processing > Applications > Machine Reading Comprehension Natural Language Processing > Resources & Methods > Multilingual NLP

Keywords

weakly supervised learning multilingual nlp question answering reading comprehension low-resource language

Download PDF

Related papers

Fine-grained Knowledge Fusion for Sequence Labeling Domain Adaptation 2019

Exploiting Monolingual Data at Scale for Neural Machine Translation 2019

Distributionally Robust Language Modeling 2019

Unsupervised Domain Adaptation of Contextualized Embeddings for Sequence Labeling 2019

ARAML: A Stable Adversarial Training Framework for Text Generation 2019