Investigating the Linguistic Performance of Large Language Models in Machine Translation

Shushen Manakhimova; Vivien Macketanz; Eleftherios Avramidis; Ekaterina Lapshinova-Koltunski; Sergei Bagdasarov; Sebastian Möller

2024 EMNLP EMNLP 2024

Investigating the Linguistic Performance of Large Language Models in Machine Translation

Abstract

AbstractThis paper summarizes the results of our test suite evaluation on 39 machine translation systems submitted at the Shared Task of the Ninth Conference of Machine Translation (WMT24). It offers a fine-grained linguistic evaluation of machine translation outputs for English–German and English–Russian, resulting from significant manual linguistic effort. Based on our results, LLMs are inferior to NMT in English–German, both in overall scores and when translating specific linguistic phenomena, such as punctuation, complex future verb tenses, and stripping. LLMs show quite a competitive performance in English-Russian, although top-performing systems might struggle with some cases of named entities and terminology, function words, mediopassive voice, and semantic roles. Additionally, some LLMs generate very verbose or empty outputs, posing challenges to the evaluation process.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shushen Manakhimova , Vivien Macketanz , Eleftherios Avramidis , Ekaterina Lapshinova-Koltunski , Sergei Bagdasarov , Sebastian Möller

Topics

Natural Language Processing > Applications > Machine Translation

Keywords

natural language processing neural machine translation linguistic evaluation large language model

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024