LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Uma Roy; Noah Constant; Rami Al-Rfou; Aditya Barua; Aaron Phillips; Yinfei Yang

2020 EMNLP EMNLP 2020

LAReQA: Language-Agnostic Answer Retrieval from a Multilingual Pool

Abstract

AbstractWe present LAReQA, a challenging new benchmark for language-agnostic answer retrieval from a multilingual candidate pool. Unlike previous cross-lingual tasks, LAReQA tests for “strong” cross-lingual alignment, requiring semantically related cross-language pairs to be closer in representation space than unrelated same-language pairs. This level of alignment is important for the practical task of cross-lingual information retrieval. Building on multilingual BERT (mBERT), we study different strategies for achieving strong alignment. We find that augmenting training data via machine translation is effective, and improves significantly over using mBERT out-of-the-box. Interestingly, model performance on zero-shot variants of our task that only target “weak” alignment is not predictive of performance on LAReQA. This finding underscores our claim that language-agnostic retrieval is a substantively new kind of cross-lingual evaluation, and suggests that measuring both weak and strong alignment will be important for improving cross-lingual systems going forward. We release our dataset and evaluation code at https://github.com/google-research-datasets/lareqa.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — cross-lingual alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Uma Roy , Noah Constant , Rami Al-Rfou , Aditya Barua , Aaron Phillips , Yinfei Yang

Topics

Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Resources & Methods > Multilingual NLP Machine Learning > Learning Types > Multi-Modal Learning Deep Learning > Learning Types > Zero-Shot Learning Artificial Intelligence > Core AI > Information Retrieval

Keywords

zero-shot learning machine translation information retrieval cross-lingual information retrieval cross-lingual alignment semantic embedding multilingual bert multilingual representation cross-lingual retrieval answer retrieval

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020