Understanding the Impact of UGC Specificities on Translation Quality

José Carlos Rosales Núñez; Djamé Seddah; Guillaume Wisniewski

2021 EMNLP EMNLP 2021

Understanding the Impact of UGC Specificities on Translation Quality

Abstract

AbstractThis work takes a critical look at the evaluation of user-generated content automatic translation, the well-known specificities of which raise many challenges for MT. Our analyses show that measuring the average-case performance using a standard metric on a UGC test set falls far short of giving a reliable image of the UGC translation quality. That is why we introduce a new data set for the evaluation of UGC translation in which UGC specificities have been manually annotated using a fine-grained typology. Using this data set, we conduct several experiments to measure the impact of different kinds of UGC specificities on translation quality, more precisely than previously possible.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

José Carlos Rosales Núñez , Djamé Seddah , Guillaume Wisniewski

Topics

Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Machine Translation Natural Language Processing > Generation > Machine Translation

Keywords

domain adaptation machine translation text analysis evaluation metric translation quality text evaluation user-generated content

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021