Improved Pure Exploration in Linear Bandits with No-Regret Learning

Mohammadi Zaki; Avi Mohan; Aditya Gopalan

2022 IJCAI IJCAI 2022

Improved Pure Exploration in Linear Bandits with No-Regret Learning

Abstract

We study the best arm identification problem in linear multi-armed bandits (LMAB) in the fixed confidence setting ; this is also the problem of optimizing an unknown linear function over a discrete domain with noisy, zeroth-order access. We propose an explicitly implementable and provably order-optimal sample-complexity algorithm for best arm identification. Existing approaches that achieve optimal sample complexity assume either access to a nontrivial minimax optimization oracle (e.g. RAGE and Lazy-TS) or knowledge of an upper-bound on the norm of the unknown parameter vector(e.g. LinGame). Our algorithm, which we call the Phased Elimination Linear Exploration Game (PELEG), maintains a high-probability confidence ellipsoid containing the unknown vector, and uses it to eliminate suboptimal arms in phases, where a minimax problem is essentially solved in each phase using two-player low regret learners. PELEG does not require to know a bound on norm of the unknown vector, and is asymptotically-optimal, settling an open problem. We show that the sample complexity of PELEG matches, up to order and in a non-asymptotic sense, an instance-dependent universal lower bound for linear bandits. PELEG is thus the first algorithm to achieve both order-optimal sample complexity and explicit implementability for this setting. We also provide numerical results for the proposed algorithm consistent with its theoretical guarantees.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — confidence ellipsoid

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mohammadi Zaki , Avi Mohan , Aditya Gopalan

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

sample complexity best arm identification no-regret learning linear bandit confidence ellipsoid phased elimination

Download PDF

Related papers

Better Collective Decisions via Uncertainty Reduction 2022

Mixed Strategies for Security Games with General Defending Requirements 2022

Achieving Envy-Freeness with Limited Subsidies under Dichotomous Valuations 2022

Distortion in Voting with Top-t Preferences 2022

Let’s Agree to Agree: Targeting Consensus for Incomplete Preferences through Majority Dynamics 2022