Handling Divergent Reference Texts when Evaluating Table-to-Text Generation

Bhuwan Dhingra; Manaal Faruqui; Ankur Parikh; Ming-Wei Chang; Dipanjan Das; William Cohen

2019 ACL ACL 2019

Handling Divergent Reference Texts when Evaluating Table-to-Text Generation

Abstract

AbstractAutomatically constructed datasets for generating text from semi-structured data (tables), such as WikiBio, often contain reference texts that diverge from the information in the corresponding semi-structured data. We show that metrics which rely solely on the reference texts, such as BLEU and ROUGE, show poor correlation with human judgments when those references diverge. We propose a new metric, PARENT, which aligns n-grams from the reference and generated texts to the semi-structured data before computing their precision and recall. Through a large scale human evaluation study of table-to-text models for WikiBio, we show that PARENT correlates with human judgments better than existing text generation metrics. We also adapt and evaluate the information extraction based evaluation proposed by Wiseman et al (2017), and show that PARENT has comparable correlation to it, while being easier to use. We show that PARENT is also applicable when the reference texts are elicited from humans using the data from the WebNLG challenge.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — text generation evaluation

🐣 Hot Topic Early Bird — human evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Bhuwan Dhingra , Manaal Faruqui , Ankur Parikh , Ming-Wei Chang , Dipanjan Das , William Cohen

Topics

Natural Language Processing > Generation > Text Generation Data Science & Analytics > Applications > Information Retrieval Natural Language Processing > Applications > Summarization Machine Learning > Optimization & Theory > Evaluation

Keywords

information extraction evaluation metric human evaluation text generation evaluation table-to-text generation text generation metric semi-structured datum reference divergence reference text n-gram alignment reference text divergence

Download PDF

Related papers

What do phone embeddings learn about Phonology? 2019

Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages 2019

Understanding Undesirable Word Embedding Associations 2019

Inferential Machine Comprehension: Answering Questions by Recursively Deducing the Evidence Chain from Text 2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction 2019