Open-Domain Dialog Evaluation Using Follow-Ups Likelihood

Maxime De Bruyn; Ehsan Lotfi; Jeska Buhmann; Walter Daelemans

2022 COLING COLING 2022

Open-Domain Dialog Evaluation Using Follow-Ups Likelihood

Abstract

AbstractAutomatic evaluation of open-domain dialogs remains an unsolved problem. Existing methods do not correlate strongly with human annotations. In this paper, we present a new automated evaluation method based on the use of follow-ups. We measure the probability that a language model will continue the conversation with a fixed set of follow-ups (e.g. not really relevant here, what are you trying to say?). When compared against twelve existing methods, our new evaluation achieves the highest correlation with human evaluations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — follow-up probability

🐣 Hot Topic Early Bird — automated evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Maxime De Bruyn , Ehsan Lotfi , Jeska Buhmann , Walter Daelemans

Topics

Natural Language Processing > Generation > Dialogue Systems Natural Language Processing > Generation > Language Modeling Machine Learning > Learning Types > Representation Learning Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Dialogue Systems Machine Learning > Learning Types > Evaluation

Keywords

language model automated evaluation dialogue evaluation automatic evaluation human correlation open-domain dialogue open-domain dialog follow-up probability follow-up likelihood

Download PDF

Related papers

MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation 2022

The Role of Context and Uncertainty in Shallow Discourse Parsing 2022

SelfMix: Robust Learning against Textual Label Noise with Self-Mixup Training 2022

Complicate Then Simplify: A Novel Way to Explore Pre-trained Models for Text Classification 2022

Repo4QA: Answering Coding Questions via Dense Retrieval on GitHub Repositories 2022