A Pronoun Test Suite Evaluation of the English–German MT Systems at WMT 2018

Liane Guillou; Christian Hardmeier; Ekaterina Lapshinova-Koltunski; Sharid Loáiciga

2018 EMNLP EMNLP 2018

A Pronoun Test Suite Evaluation of the English–German MT Systems at WMT 2018

Abstract

AbstractWe evaluate the output of 16 English-to-German MT systems with respect to the translation of pronouns in the context of the WMT 2018 competition. We work with a test suite specifically designed to assess system quality in various fine-grained categories known to be problematic. The main evaluation scores come from a semi-automatic process, combining automatic reference matching with extensive manual annotation of uncertain cases. We find that current NMT systems are good at translating pronouns with intra-sentential reference, but the inter-sentential cases remain difficult. NMT systems are also good at the translation of event pronouns, unlike systems from the phrase-based SMT paradigm. No single system performs best at translating all types of anaphoric pronouns, suggesting unexplained random effects influencing the translation of pronouns with NMT.

🧭 Keyword Pioneer — anaphoric pronoun

🐣 Hot Topic Early Bird — evaluation benchmark

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Liane Guillou , Christian Hardmeier , Ekaterina Lapshinova-Koltunski , Sharid Loáiciga

Topics

Machine Learning > Application Areas > Efficient Computing

Keywords

neural machine translation evaluation benchmark pronoun translation test suite anaphoric reference anaphoric pronoun

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018