2022
NAACL
NAACL 2022
DialSummEval: Revisiting Summarization Evaluation for Dialogues
Abstract
AbstractDialogue summarization is receiving increasing attention from researchers due to its extraordinary difficulty and unique application value. We observe that current dialogue summarization models have flaws that may not be well exposed by frequently used metrics such as ROUGE. In our paper, we re-evaluate 18 categories of metrics in terms of four dimensions: coherence, consistency, fluency and relevance, as well as a unified human evaluation of various models for the first time. Some noteworthy trends which are different from the conventional summarization tasks are identified. We will release DialSummEval, a multi-faceted dataset of human judgments containing the outputs of 14 models on SAMSum.
🌉
Interdisciplinary Bridge
— Data Science & Analytics and Machine Learning and Natural Language Processing
🐣
Hot Topic Early Bird
— summarization evaluation
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Natural Language Processing > Generation > Dialogue Systems
Natural Language Processing > Generation > Summarization
Data Science & Analytics > Applications > Information Retrieval
Natural Language Processing > Applications > Dialogue Systems
Natural Language Processing > Applications > Summarization
Machine Learning > Optimization & Theory > Evaluation