Using Language Models to Disambiguate Lexical Choices in Translation

Josh Barua; Sanjay Subramanian; Kayo Yin; Alane Suhr

2024 EMNLP EMNLP 2024

Using Language Models to Disambiguate Lexical Choices in Translation

Abstract

AbstractIn translation, a concept represented by a single word in a source language can have multiple variations in a target language. The task of lexical selection requires using context to identify which variation is most appropriate for a source text. We work with native speakers of nine languages to create DTAiLS, a dataset of 1,377 sentence pairs that exhibit cross-lingual concept variation when translating from English. We evaluate recent LLMs and neural machine translation systems on DTAiLS, with the best-performing model, GPT-4, achieving from 67 to 85% accuracy across languages. Finally, we use language models to generate English rules describing target-language concept variations. Providing weaker models with high-quality lexical rules improves accuracy substantially, in some cases reaching or outperforming GPT-4.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — cross-lingual concept

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Josh Barua , Sanjay Subramanian , Kayo Yin , Alane Suhr

Topics

Machine Learning > Core Methods > Classification Natural Language Processing > Applications > Information Extraction Natural Language Processing > Applications > Machine Translation Natural Language Processing > Generation > Machine Translation Artificial Intelligence > Core AI > Language Natural Language Processing > Applications > Natural Language Understanding Machine Learning > Learning Types > Machine Translation

Keywords

machine translation neural machine translation language model lexical selection context disambiguation cross-lingual semantics cross-lingual concept

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024