Coordination without communication: optimal regret in two players multi-armed bandits

Sébastien Bubeck; Thomas Budzinski

2020 COLT COLT 2020

Coordination without communication: optimal regret in two players multi-armed bandits

Abstract

We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. Under the assumption that shared randomness is available, we propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Sébastien Bubeck , Thomas Budzinski

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Mathematics & Optimization > Optimization > Online Algorithms

Keywords

collision avoidance multi-armed bandit regret bound shared randomness

Download PDF

Related papers

Open Problem: Average-Case Hardness of Hypergraphic Planted Clique Detection 2020

Highly smooth minimization of non-smooth problems 2020

Closure Properties for Private Classification and Online Prediction 2020

Efficient, Noise-Tolerant, and Private Learning via Boosting 2020

Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit 2020