2023
EMNLP
EMNLP 2023
mSCAN: A Dataset for Multilingual Compositional Generalisation Evaluation
Abstract
AbstractLanguage models achieve remarkable results on a variety of tasks, yet still struggle on compositional generalisation benchmarks. The majority of these benchmarks evaluate performance in English only, leaving us with the question of whether these results generalise to other languages. As an initial step to answering this question, we introduce mSCAN, a multilingual adaptation of the SCAN dataset. It was produced by a rule-based translation, developed in cooperation with native speakers. We then showcase this novel dataset on some in-context learning experiments, and GPT3.5 and the multilingual large language model BLOOM as well as gpt3.5-turbo.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— multilingual compositional generalisation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Learning Paradigms > Transfer Learning
Machine Learning > Learning Types > Zero-Shot Learning
Natural Language Processing > Applications > Machine Translation
Natural Language Processing > Applications > Text Classification
Natural Language Processing > Resources & Methods > Multilingual NLP
Machine Learning > Learning Types > Evaluation