2025 AACL AACL 2025

Data Augmentation for Low-resource Neural Machine Translation: A Systematic Analysis

Abstract

AbstractAs an effective way to address data scarcity problem, data augmentation has received significant interest in low-resource neural machine translation, while the latter has the potential to reduce digital divide and benefit out of domain translation. However, the existing works mainly focus on how to generate the synthetic data, while the synthetic data quality and the way we use the synthetic data also matter. In this paper, we give a systematic analysis of data augmentation for low-resource neural machine translation that encompasses all the three aspects. We show that with careful control of the synthetic data quality and the way we use the synthetic data, the performance can be greatly boosted even with the same method to generate the synthetic data.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors