2023 EMNLP EMNLP 2023

ANLP-RG at NADI 2023 shared task: Machine Translation of Arabic Dialects: A Comparative Study of Transformer Models

Abstract

AbstractIn this paper, we present our findings within the context of the NADI-2023 Shared Task (Subtask 2). Our task involves developing a translation model from the Palestinian, Jordanian, Emirati, and Egyptian dialects to Modern Standard Arabic (MSA) using the MADAR parallel corpus, even though it lacks a parallel subset for the Emirati dialect. To address this challenge, we conducted a comparative analysis, evaluating the fine-tuning results of various transformer models using the MADAR corpus as a learning resource. Additionally, we assessed the effectiveness of existing translation tools in achieving our translation objectives. The best model achieved a BLEU score of 11.14% on the dev set and 10.02 on the test set.

🌉 Interdisciplinary Bridge — Deep Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio