2025 AAAI AAAI 2025

Nearly Tight Bounds for Exploration in Streaming Multi-Armed Bandits with Known Optimality Gap

Abstract

Abstract We investigate the sample-memory-pass trade-offs for pure exploration in multi-pass streaming multi-armed bandits (MABs) with the *a priori* knowledge of the optimality gap ?_[2]. Here, and throughout, the optimality gap ?_[i] is defined as the mean reward gap between the best and the i-th best arms. A recent line of results have shown that if there is no known ?_[2], a pass complexity of ̃?(log(1/?_[2])) is necessary and sufficient to obtain the *worst-case optimal* O(n/?²_[2]) sample complexity with a single-arm memory. However, our understanding of multi-pass algorithms with known ?_[2] is still limited. Here, the key open problem is how many passes are required to achieve the complexity, i.e., O( ∑ᵢ₌₂ⁿ1/?²_[i] log{n}) arm pulls, with a sublinear memory size. In this work, we show that the ``right answer'' for the question is ̃?(log{n}) passes. We first present a lower bound, showing that any algorithm that finds the best arm with slightly sublinear memory -- a memory of o(n/polylog(n)) arms -- and O( ∑ᵢ₌₂ⁿ1/?²_[i] log n) arm pulls has to make ?(log n/loglog n) passes over the stream. We then show a nearly-matching algorithm that assuming the knowledge of ?_[2], finds the best arm with O( ∑ᵢ₌₂ⁿ1/?²_[i] log n) arm pulls and a *single arm* memory.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio