Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point

Liane Guillou; Christian Hardmeier

2018 EMNLP EMNLP 2018

Automatic Reference-Based Evaluation of Pronoun Translation Misses the Point

Abstract

AbstractWe compare the performance of the APT and AutoPRF metrics for pronoun translation against a manually annotated dataset comprising human judgements as to the correctness of translations of the PROTEST test suite. Although there is some correlation with the human judgements, a range of issues limit the performance of the automated metrics. Instead, we recommend the use of semi-automatic metrics and test suites in place of fully automatic metrics.

🧭 Keyword Pioneer — automated metrics

🐣 Hot Topic Early Bird — evaluation metric

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Liane Guillou , Christian Hardmeier

Topics

Natural Language Processing > Applications > Machine Translation

Keywords

machine translation automated metrics evaluation metric evaluation metrics pronoun translation automated metric reference-based evaluation

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018