Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Arij Riabi; Thomas Scialom; Rachel Keraron; Benoît Sagot; Djamé Seddah; Jacopo Staiano

2021 EMNLP EMNLP 2021

Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering

Abstract

AbstractCoupled with the availability of large scale datasets, deep learning architectures have enabled rapid progress on the Question Answering task. However, most of those datasets are in English, and the performances of state-of-the-art multilingual models are significantly lower when evaluated on non-English data. Due to high data collection costs, it is not realistic to obtain annotated data for each language one desires to support. We propose a method to improve the Cross-lingual Question Answering performance without requiring additional annotated data, leveraging Question Generation models to produce synthetic samples in a cross-lingual fashion. We show that the proposed method allows to significantly outperform the baselines trained on English data only. We report a new state-of-the-art on four datasets: MLQA, XQuAD, SQuAD-it and PIAF (fr).

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Arij Riabi , Thomas Scialom , Rachel Keraron , Benoît Sagot , Djamé Seddah , Jacopo Staiano

Topics

Machine Learning > Learning Types > Zero-Shot Learning Machine Learning > Application Areas > Data Augmentation Natural Language Processing > Applications > Question Answering Natural Language Processing > Resources & Methods > Multilingual NLP Machine Learning > Learning Types > Transfer Learning Deep Learning > Learning Types > Transfer Learning Deep Learning > Learning Types > Zero-Shot Learning Deep Learning > Learning Types > Data Augmentation

Keywords

zero-shot learning data augmentation cross-lingual transfer question generation zero-shot transfer synthetic data augmentation multilingual model cross-lingual question answering

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021