2025 ACL ACL 2025

ConLoan: A Contrastive Multilingual Dataset for Evaluating Loanwords

Abstract

AbstractLexical borrowing, the adoption of words from one language into another, is a ubiquitous linguistic phenomenon influenced by geopolitical, societal, and technological factors. This paper introduces ConLoan–a novel contrastive dataset comprising sentences with and without loanwords across 10 languages. Through systematic evaluation using this dataset, we investigate how state-of-the-art machine translation and language models process loanwords compared to their native alternatives. Our experiments reveal that these systems show systematic preferences for loanwords over native terms and exhibit varying performance across languages. These findings provide valuable insights for developing more linguistically robust NLP systems.

🧭 Keyword Pioneer — lexical borrowing
🐝 Cross-Pollinator — Artificial Intelligence, Natural Language Processing, Speech & Audio
🌉 Interdisciplinary Bridge — Deep Learning and Interdisciplinary and Machine Learning and Natural Language Processing