{Multi-Player Bandits Revisited}

Lilian Besson; Emilie Kaufmann

2018 ALT ALT 2018

{Multi-Player Bandits Revisited}

Abstract

Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that \emph{sensing information} is available to the algorithm. Under this assumption, we improve the state-of-the-art lower bound for the regret of any decentralized algorithms and introduce two algorithms, \emph{RandTopM} and \emph{MCTopM}, that are shown to empirically outperform existing algorithms. Moreover, we provide strong theoretical guarantees for these algorithms, including a notion of asymptotic optimality in terms of the number of selections of bad arms. We then introduce a promising heuristic, called \emph{Selfish}, that can operate without sensing information, which is crucial for emerging applications to Internet of Things networks. We investigate the empirical performance of this algorithm and provide some first theoretical elements for the understanding of its behavior.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Mathematics & Optimization

🧭 Keyword Pioneer — multi-player multi-armed bandit

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Lilian Besson , Emilie Kaufmann

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Mathematics & Optimization > Optimization > Online Algorithms

Keywords

regret bound decentralized algorithm multi-player multi-armed bandit cognitive radio

Download PDF

Related papers

Dimension-free Information Concentration via Exp-Concavity 2018

Multi-task {K}ernel {L}earning Based on {P}robabilistic {L}ipschitzness 2018

An Adaptive Strategy for Active Learning with Smooth Decision Boundary 2018

Corrupt Bandits for Preserving Local Privacy 2018

Online Learning of Combinatorial Objects via Extended Formulation 2018