Nearly Tight Bounds for Exploration in Streaming Multi-Armed Bandits with Known Optimality Gap

Nikolai Karpov; Chen Wang

2025 AAAI AAAI 2025

Nearly Tight Bounds for Exploration in Streaming Multi-Armed Bandits with Known Optimality Gap

Abstract

Abstract We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the *a priori* knowledge of the optimality gap ?_[2]. Here, and throughout, the optimality gap ?_[i] is defined as the mean reward gap between the best and the i-th best arms. A recent line of results have shown that if there is no known ?_[2], a pass complexity of ̃?(log(1/?_[2])) is necessary and sufficient to obtain the *worst-case optimal* O(n/?²_[2]) sample complexity with a single-arm memory. However, our understanding of multi-pass algorithms with known ?_[2] is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., O( ∑ᵢ₌₂ⁿ1/?²_[i] log{n}) arm pulls, with a sublinear memory size. In this work, we show that the ``right answer'' for the question is ̃?(log{n}) passes. We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -- a memory of o(n/polylog(n)) arms -- and O( ∑ᵢ₌₂ⁿ1/?²_[i] log n) arm pulls has to make ?(log n/loglog n) passes over the stream. We then show a nearly-matching algorithm that assuming the knowledge of ?_[2], finds the best arm with O( ∑ᵢ₌₂ⁿ1/?²_[i] log n) arm pulls and a *single arm* memory.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nikolai Karpov , Chen Wang

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Stochastic Methods Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Online Learning Machine Learning > Optimization & Theory > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits Machine Learning > Learning Types > Exploration

Keywords

online learning sample complexity multi-armed bandit online algorithm streaming algorithm optimality gap

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025