Identifying Copeland Winners in Dueling Bandits with Indifferences

Viktor Bengs; Björn Haddenhorst; Eyke Hüllermeier

2024 AISTATS AISTATS 2024

Identifying Copeland Winners in Dueling Bandits with Indifferences

Abstract

We consider the task of identifying the Copeland winner(s) in a dueling bandits problem with ternary feedback. This is an underexplored but practically relevant variant of the conventional dueling bandits problem, in which, in addition to strict preference between two arms, one may observe feedback in the form of an indifference. We provide a lower bound on the sample complexity for any learning algorithm finding the Copeland winner(s) with a fixed error probability. Moreover, we propose POCOWISTA, an algorithm with a sample complexity that almost matches this lower bound, and which shows excellent empirical performance, even for the conventional dueling bandits problem. For the case where the preference probabilities satisfy a specific type of stochastic transitivity, we provide a refined version with an improved worst case sample complexity.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Viktor Bengs , Björn Haddenhorst , Eyke Hüllermeier

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Learning Types > Active Learning Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits Artificial Intelligence > Core AI > Game Theory

Keywords

online learning preference learning sample complexity dueling bandit copeland winner stochastic transitivity

Download PDF

Related papers

Causal Bandits with General Causal Models and Interventions 2024

Boundary-Aware Uncertainty for Feature Attribution Explainers 2024

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective 2024

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning 2024

Pure Exploration in Bandits with Linear Constraints 2024