2022
COLING
COLING 2022
Rakuten’s Participation in WAT 2022: Parallel Dataset Filtering by Leveraging Vocabulary Heterogeneity
Abstract
AbstractThis paper introduces our neural machine translation system’s participation in the WAT 2022 shared translation task (team ID: sakura). We participated in the Parallel Data Filtering Task. Our approach based on Feature Decay Algorithms achieved +1.4 and +2.4 BLEU points for English to Japanese and Japanese to English respectively compared to the model trained on the full dataset, showing the effectiveness of FDA on in-domain data selection.
🌉
Interdisciplinary Bridge
— Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— parallel dataset filtering
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio