Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Sébastien Bubeck; Yuanzhi Li; Yuval Peres; Mark Sellke

2020 COLT COLT 2020

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Abstract

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication and no shared randomness at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $\sqrt{T}$-type regret guarantee for this problem, assuming only two players, and under the feedback model where collisions are announced to the colliding players. We also prove the first sublinear regret guarantee for the feedback model where collision information is not available, namely $T^{1-\frac{1}{2m}}$ where $m$ is the number of players.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — collision information

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sébastien Bubeck , Yuanzhi Li , Yuval Peres , Mark Sellke

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Optimization & Theory > Online Algorithms

Keywords

multi-armed bandit regret bound multi-agent system collision information

Download PDF

Related papers

Open Problem: Average-Case Hardness of Hypergraphic Planted Clique Detection 2020

Highly smooth minimization of non-smooth problems 2020

Closure Properties for Private Classification and Online Prediction 2020

Efficient, Noise-Tolerant, and Private Learning via Boosting 2020

Domain Compression and its Application to Randomness-Optimal Distributed Goodness-of-Fit 2020