2022
ACL
ACL 2022
ChatMatch: Evaluating Chatbots by Autonomous Chat Tournaments
Abstract
AbstractExisting automatic evaluation systems of chatbots mostly rely on static chat scripts as ground truth, which is hard to obtain, and requires access to the models of the bots as a form of “white-box testing”. Interactive evaluation mitigates this problem but requires human involvement. In our work, we propose an interactive chatbot evaluation framework in which chatbots compete with each other like in a sports tournament, using flexible scoring metrics. This framework can efficiently rank chatbots independently from their model architectures and the domains for which they are trained.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— autonomous evaluation
🐣
Hot Topic Early Bird
— conversational agent
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio