2024
EMNLP
EMNLP 2024
A High-quality Seed Dataset for Italian Machine Translation
Abstract
AbstractThis paper describes the submission of a high-quality translation of the OLDI Seed datasetinto Italian for the WMT 2023 Open LanguageData Initiative shared task.The base of this submission is a previous ver-sion of an Italian OLDI Seed dataset releasedby Haberland et al. (2024) via machine trans-lation and partial post-editing. This data wassubsequently reviewed in its entirety by twonative speakers of Italian, who carried out ex-tensive post-editing with particular attention tothe idiomatic translation of named entities.
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio