2025
EMNLP
EMNLP 2025
Expanding the WMT24++ Benchmark with Rumantsch Grischun, Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader
Abstract
AbstractThe Romansh language, spoken in Switzerland, has limited resources for machine translation evaluation. In this paper, we present a benchmark for six varieties of Romansh: Rumantsch Grischun, a supra-regional variety, and five regional varieties: Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader. Our reference translations were created by human translators based on the WMT24++ benchmark, which ensures parallelism with more than 55 other languages. An automatic evaluation of existing MT systems and LLMs shows that translation out of Romansh into German is handled relatively well for all the varieties, but translation into Romansh is still challenging.
🌉
Interdisciplinary Bridge
— Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— romansh language
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Jannis Vamvas
,
Ignacio Pérez Prat
,
Not Soliva
,
Sandra Baltermia-Guetg
,
Andrina Beeli
,
Simona Beeli
,
Madlaina Capeder
,
Laura Decurtins
,
Gian Peder Gregori
,
Flavia Hobi
,
Gabriela Holderegger
,
Arina Lazzarini
,
Viviana Lazzarini
,
Walter Rosselli
,
Bettina Vital
,
Anna Rutkiewicz
,
Rico Sennrich