Stochastic Multi-armed Bandits in Constant Space

David Liau; Zhao Song; Eric Price; Ger Yang

2018 AISTATS AISTATS 2018

Stochastic Multi-armed Bandits in Constant Space

Abstract

We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O(1)$ words of space with regret $\sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{∆}\log T$ where $\Delta_i$ is the gap between the best arm and arm $i$ and $∆$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$, this is within an $O(\log (1/∆))$ factor of the optimum regret possible without space constraints.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — sublinear space

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

David Liau , Zhao Song , Eric Price , Ger Yang

Topics

Machine Learning > Optimization & Theory > Stochastic Processes Mathematics & Optimization > Optimization > Online Algorithms

Keywords

multi-armed bandit regret bound online algorithm sublinear space

Download PDF

Related papers

The Geometry of Random Features 2018

A Fast Algorithm for Separated Sparsity via Perturbed Lagrangians 2018

Regional Multi-Armed Bandits 2018

Group Invariance Principles for Causal Generative Models 2018

Stochastic Three-Composite Convex Minimization with a Linear Operator 2018