Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics

Théo Gigant; Camille Guinaudeau; Marc Decombas; Frederic Dufaux

2024 EMNLP EMNLP 2024

Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics

Abstract

AbstractAutomatic metrics are used as proxies to evaluate abstractive summarization systems when human annotations are too expensive. To be useful, these metrics should be fine-grained, show a high correlation with human annotations, and ideally be independant of reference quality; however, most standard evaluation metrics for summarization are reference-based, and existing reference-free metrics correlates poorly with relevance, especially on summaries of longer documents. In this paper, we introduce a reference-free metric that correlates well with human evaluated relevance, while being very cheap to compute. We show that this metric can also be used along reference-based metrics to improve their robustness in low quality reference settings.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — relevance correlation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Théo Gigant , Camille Guinaudeau , Marc Decombas , Frederic Dufaux

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Applications > Summarization Deep Learning > Optimization & Theory > Evaluation

Keywords

text summarization summarization evaluation abstractive summarization automatic evaluation reference-free metric relevance correlation

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024