Error Identification for Machine Translation with Metric Embedding and Attention

Raphael Rubino; Atsushi Fujita; Benjamin Marie

2021 EMNLP EMNLP 2021

Error Identification for Machine Translation with Metric Embedding and Attention

Abstract

AbstractQuality Estimation (QE) for Machine Translation has been shown to reach relatively high accuracy in predicting sentence-level scores, relying on pretrained contextual embeddings and human-produced quality scores. However, the lack of explanations along with decisions made by end-to-end neural models makes the results difficult to interpret. Furthermore, word-level annotated datasets are rare due to the prohibitive effort required to perform this task, while they could provide interpretable signals in addition to sentence-level QE outputs. In this paper, we propose a novel QE architecture which tackles both the word-level data scarcity and the interpretability limitations of recent approaches. Sentence-level and word-level components are jointly pretrained through an attention mechanism based on synthetic data and a set of MT metrics embedded in a common space. Our approach is evaluated on the Eval4NLP 2021 shared task and our submissions reach the first position in all language pairs. The extraction of metric-to-input attention weights show that different metrics focus on different parts of the source and target text, providing strong rationales in the decision-making process of the QE model.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — word-level datum

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Raphael Rubino , Atsushi Fujita , Benjamin Marie

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Classification Natural Language Processing > Understanding > Semantic Analysis Natural Language Processing > Applications > Machine Translation Machine Learning > Learning Types > Multi-Modal Learning Machine Learning > Learning Types > Metric Learning Deep Learning > Techniques > Attention Natural Language Processing > Applications > Quality Estimation

Keywords

attention mechanism machine translation quality estimation synthetic datum word-level prediction metric embedding sentence-level score word-level datum

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021