Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

Arnab Maiti; Vishakha Patil; Arindam Khan

2021 NIPS NeurIPS 2021

Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

Abstract

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight guarantee. We show $\Omega(T^{2/3})$ cumulative regret in expectation for single-pass algorithms for arm-memory size of $(n-1)$, where $n$ is the number of arms. For best-arm identification, we provide an $(\varepsilon, \delta)$-PAC algorithm with arm memory size of $O(\log^*n)$ and $O(\frac{n}{\varepsilon^2}\cdot \log(\frac{1}{\delta}))$ optimal sample complexity.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Arnab Maiti , Vishakha Patil , Arindam Khan

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Stochastic Methods Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

stochastic optimization sample complexity regret minimization multi-armed bandit best-arm identification bounded memory

Download PDF

Related papers

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data 2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 2021

Test-Time Personalization with a Transformer for Human Pose Estimation 2021

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations 2021

Scalable Intervention Target Estimation in Linear Models 2021