Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

Wei Huang; Richard Combes; Cindy Trinh

2022 COLT COLT 2022

Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information

Abstract

We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not scale inversely proportionally to the minimal expected reward. We prove a theoretical regret upper bound to justify these claims. We complement our theoretical results with numerical experiments, showing that the proposed algorithm outperforms state-of-the-art in practice.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Mathematics & Optimization

🧭 Keyword Pioneer — collision sensing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Wei Huang , Richard Combes , Cindy Trinh

Topics

Artificial Intelligence > Core AI > Game AI Mathematics & Optimization > Mathematics > Probability Mathematics & Optimization > Optimization > Online Algorithms

Keywords

multi-armed bandit regret bound expected reward collision sensing

Download PDF

Related papers

Non-Convex Optimization with Certificates and Fast Rates Through Kernel Sums of Squares 2022

Analysis of Langevin Monte Carlo from Poincare to Log-Sobolev 2022

Mirror Descent Strikes Again: Optimal Stochastic Convex Optimization under Infinite Noise Variance 2022

Tight query complexity bounds for learning graph partitions 2022

Pushing the Efficiency-Regret Pareto Frontier for Online Learning of Portfolios and Quantum States 2022