HaRiM+: Evaluating Summary Quality with Hallucination Risk

Seonil (Simon) Son; Junsoo Park; Jeong-in Hwang; Junghwa Lee; Hyungjong Noh; Yeonsoo Lee

2022 IJCNLP IJCNLP 2022

HaRiM+: Evaluating Summary Quality with Hallucination Risk

Abstract

AbstractOne of the challenges of developing a summarization model arises from the difficulty in measuring the factual inconsistency of the generated text. In this study, we reinterpret the decoder overconfidence-regularizing objective suggested in (Miao et al., 2021) as a hallucination risk measurement to better estimate the quality of generated summaries. We propose a reference-free metric, HaRiM+, which only requires an off-the-shelf summarization model to compute the hallucination risk based on token likelihoods. Deploying it requires no additional training of models or ad-hoc modules, which usually need alignment to human judgments. For summary-quality estimation, HaRiM+ records state-of-the-art correlation to human judgment on three summary-quality annotation sets: FRANK, QAGS, and SummEval. We hope that our work, which merits the use of summarization models, facilitates the progress of both automated evaluation and generation of summary.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — hallucination risk

🐣 Hot Topic Early Bird — summarization evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Speech & Audio

Authors

Seonil (Simon) Son , Junsoo Park , Jeong-in Hwang , Junghwa Lee , Hyungjong Noh , Yeonsoo Lee

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Core Methods > Classification Machine Learning > Application Areas > Efficient Computing Deep Learning > Models > Generative Models Natural Language Processing > Generation > Summarization

Keywords

summarization evaluation hallucination detection reference-free metric token likelihood factual inconsistency summary quality hallucination risk

Download PDF

Related papers

Chasing the Tail with Domain Generalization: A Case Study on Frequency-Enriched Datasets 2022

Double Trouble: How to not Explain a Text Classifier’s Decisions Using Counterfactuals Synthesized by Masked Language Models? 2022

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning 2022

Graph-augmented Learning to Rank for Querying Large-scale Knowledge Graph 2022

Missing Modality meets Meta Sampling (M3S): An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality 2022