Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task

An Yang; Kai Liu; Jing Liu; Yajuan Lyu; Sujian Li

2018 ACL ACL 2018

Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task

Abstract

AbstractCurrent evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between candidate and reference answers, such as ROUGE and BLEU. However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists. In this paper, we make adaptations on the metrics to better correlate n-gram overlap with the human judgment for answers to these two question types. Statistical analysis proves the effectiveness of our approach. Our adaptations may provide positive guidance for the development of real-scene MRC systems.

🐣 Hot Topic Early Bird — evaluation metrics

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

An Yang , Kai Liu , Jing Liu , Yajuan Lyu , Sujian Li

Topics

Natural Language Processing > Applications > Machine Reading Comprehension

Keywords

machine reading comprehension evaluation metrics

Download PDF

Related papers

Economic Event Detection in Company-Specific News Text 2018

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus 2018

SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment 2018

Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer 2018

Affordances in Grounded Language Learning 2018