2024 COLING COLING 2024

Residual Dropout: A Simple Approach to Improve Transformer’s Data Efficiency

Abstract

AbstractTransformer models often demand a vast amount of training data to achieve the desired level of performance. However, this data requirement poses a major challenge for low-resource languages seeking access to high-quality systems, particularly in tasks like Machine Translation. To address this issue, we propose adding Dropout to Transformer’s Residual Connections. Our experimental results demonstrate that this modification effectively mitigates overfitting during training, resulting in substantial performance gains of over 4 BLEU points on a dataset consisting of merely 10 thousand examples.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing
🧭 Keyword Pioneer — residual dropout
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio