InteractEva: A Simulation-Based Evaluation Framework for Interactive AI Systems

Yannis Katsis; Maeda F. Hanafi; Martín Santillán Cooper; Yunyao Li

2022 AAAI AAAI 2022

InteractEva: A Simulation-Based Evaluation Framework for Interactive AI Systems

Abstract

Abstract Evaluating interactive AI (IAI) systems is a challenging task, as their output highly depends on the performed user actions. As a result, developers often depend on limited and mostly qualitative data derived from user testing to improve their systems. In this paper, we present InteractEva; a systematic evaluation framework for IAI systems. InteractEva employs (a) a user simulation backend to test the system against different use cases and user interactions at scale with (b) an interactive frontend allowing developers to perform important quantitative evaluation tasks, including acquiring a performance overview, performing error analysis, and conducting what-if studies. The framework has supported the evaluation and improvement of an industrial IAI text extraction system, results of which will be presented during our demonstration.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐣 Hot Topic Early Bird — evaluation framework

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Yannis Katsis , Maeda F. Hanafi , Martín Santillán Cooper , Yunyao Li

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Artificial Intelligence > Core AI > Interpretability Machine Learning > Application Areas > Efficient Computing Machine Learning > Learning Types > Evaluation

Keywords

evaluation framework error analysis interactive ai user simulation quantitative evaluation

Download PDF

Related papers

Dynamic Spatial Propagation Network for Depth Completion 2022

FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition 2022

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding 2022

AnchorFace: Boosting TAR@FAR for Practical Face Recognition 2022

Parallel and High-Fidelity Text-to-Lip Generation 2022