2026 EACL EACL 2026

NUS-IDS at AMIYA/VarDial 2026: Improving Arabic Dialectness in LLMs with Reinforcement Learning

Abstract

AbstractIn this paper, we describe models developed by our team, NUS-IDS, for the Closed data track at the Arabic Modeling In Your Accent (AMIYA) shared task at VarDial 2026. The core idea behind our solution involves data augmentation enabled by a dialect classifier trained on AMIYA data. We effectively combine various translation, summarization, and question answering prompts with AMIYA training data to form dialectal prompts for use with state-of-the-art LLMs. Next, dialect predictions from our classifier on outputs from these LLMs are used to compile preference data for Reinforcement Learning (RL). We report model performance on dialectal Arabic from Egypt, Morocco, Palestine, Saudi Arabia and Syria using FLORES+, a multilingual machine translation dataset. Our experiments illustrate that though our RL models show significant performance gains on dialectness scores, they under perform on translation metrics such as chrF++ compared to base LLMs.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing
🧭 Keyword Pioneer — dialect classifier
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio