2024 EMNLP EMNLP 2024

Samsung R&D Institute Philippines @ WMT 2024 Low-resource Languages of Spain Shared Task

Abstract

AbstractThis paper details the submission of Samsung R&D Institute Philippines (SRPH) Language Intelligence Team (LIT) to the WMT 2024 Low-resource Languages of Spain shared task. We trained translation models for Spanish to Aragonese, Spanish to Aranese/Occitan, and Spanish to Asturian using a standard sequence-to-sequence Transformer architecture, augmenting it with a noisy-channel reranking strategy to select better outputs during decoding. For Spanish to Asturian translation, our method reaches comparable BLEU scores to a strong commercial baseline translation system using only constrained data, backtranslations, noisy channel reranking, and a shared vocabulary spanning all four languages.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning
🧭 Keyword Pioneer — noisy-channel reranking
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio