InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Pierre Jean A. Colombo; Chloé Clavel; Pablo Piantanida

2022 AAAI AAAI 2022

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Abstract

Abstract Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the possibility to adapt InfoLM to different evaluation criteria. Using direct assessment, we demonstrate that InfoLM achieves statistically significant improvement and two figure correlation gains in many configurations compared to existing metrics on both summarization and data2text generation tasks.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — data2text generation

🐣 Hot Topic Early Bird — summarization evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Pierre Jean A. Colombo , Chloé Clavel , Pablo Piantanida

Topics

Natural Language Processing > Generation > Summarization Natural Language Processing > Generation > Text Generation Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Optimization & Theory > Evaluation

Keywords

masked language model summarization evaluation information measure data-to-text generation automatic metric automatic evaluation metric data2text generation

Download PDF

Related papers

Dynamic Spatial Propagation Network for Depth Completion 2022

FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition 2022

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding 2022

AnchorFace: Boosting TAR@FAR for Practical Face Recognition 2022

Parallel and High-Fidelity Text-to-Lip Generation 2022