2025
ACL
ACL 2025
ConLoan: A Contrastive Multilingual Dataset for Evaluating Loanwords
Abstract
AbstractLexical borrowing, the adoption of words from one language into another, is a ubiquitous linguistic phenomenon influenced by geopolitical, societal, and technological factors. This paper introduces ConLoan–a novel contrastive dataset comprising sentences with and without loanwords across 10 languages. Through systematic evaluation using this dataset, we investigate how state-of-the-art machine translation and language models process loanwords compared to their native alternatives. Our experiments reveal that these systems show systematic preferences for loanwords over native terms and exhibit varying performance across languages. These findings provide valuable insights for developing more linguistically robust NLP systems.
🧭
Keyword Pioneer
— lexical borrowing
🐝
Cross-Pollinator
— Artificial Intelligence, Natural Language Processing, Speech & Audio
🌉
Interdisciplinary Bridge
— Deep Learning and Interdisciplinary and Machine Learning and Natural Language Processing
Authors
Topics
Natural Language Processing > Understanding > Semantic Analysis
Natural Language Processing > Applications > Machine Translation
Natural Language Processing > Resources & Methods > Multilingual NLP
Interdisciplinary > Linguistics
Interdisciplinary > Linguistics > Computational Linguistics
Natural Language Processing > Generation > Machine Translation
Machine Learning > Learning Types > Evaluation
Deep Learning > Learning Types > Representation Learning