How to Evaluate Behavioral Models

Greg d'Eon; Sophie Greenwood; Kevin Leyton-brown; James R. Wright

2024 AAAI AAAI 2024

How to Evaluate Behavioral Models

Abstract

Abstract Researchers building behavioral models, such as behavioral game theorists, use experimental data to evaluate predictive models of human behavior. However, there is little agreement about which loss function should be used in evaluations, with error rate, negative log-likelihood, cross-entropy, Brier score, and squared L2 error all being common choices. We attempt to offer a principled answer to the question of which loss functions should be used for this task, formalizing axioms that we argue loss functions should satisfy. We construct a family of loss functions, which we dub ``diagonal bounded Bregman divergences'', that satisfy all of these axioms. These rule out many loss functions used in practice, but notably include squared L2 error; we thus recommend its use for evaluating behavioral models.

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Greg d'Eon , Sophie Greenwood , Kevin Leyton-brown , James R. Wright

Topics

Machine Learning > Optimization & Theory > Loss Functions Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Statistical Learning Interdisciplinary > Cognitive Science > Cognitive Modeling

Keywords

game theory bregman divergence loss function cross-entropy loss brier score axiomatic analysis behavioral model

Download PDF

Related papers

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI 2024

Meta-Inverse Reinforcement Learning for Mean Field Games via Probabilistic Context Variables 2024

Suppressing Uncertainty in Gaze Estimation 2024

Mask-Homo: Pseudo Plane Mask-Guided Unsupervised Multi-Homography Estimation 2024

Heterogeneous Test-Time Training for Multi-Modal Person Re-identification 2024