Towards Instance Optimal Bounds for Best Arm Identification

Lijie Chen; Jian Li; Mingda Qiao

2017 COLT COLT 2017

Towards Instance Optimal Bounds for Best Arm Identification

Abstract

In the classical best arm identification (Best-$1$-Arm) problem, we are given $n$ stochastic bandit arms, each associated with a reward distribution with an unknown mean. Upon each play of an arm, we can get a reward sampled i.i.d. from its reward distribution. We would like to identify the arm with the largest mean with probability at least $1-δ$, using as few samples as possible. The problem has a long history and understanding its sample complexity has attracted significant attention since the last decade. However, the optimal sample complexity of the problem is still unknown. Recently, Chen and Li (2016) made an interesting conjecture, called gap-entropy conjecture, concerning the instance optimal sample complexity of Best-$1$-Arm. Given a Best-$1$-Arm instance $I$ (i.e., a set of arms), let $\mu_[i]$ denote the $i$th largest mean and $\Delta_[i]=\mu_[1]-\mu_[i]$ denote the corresponding gap. $H(I)=\sum_i=2^n\Delta_[i]^-2$ denotes the complexity of the instance. The gap-entropy conjecture states that for any instance $I$, $Ω\left(H(I)⋅\left(\lnδ^-1 + \mathsf{Ent}(I)\right)\right)$ is an instance lower bound, where $\mathsf{Ent}(I)$ is an entropy-like term determined by the gaps, and there is a $δ$-correct algorithm for Best-$1$-Arm with sample complexity $O\left(H(I)⋅\left(\lnδ^-1 + \mathsf{Ent}(I)\right)+\Delta_[2]^-2\ln\ln\Delta_[2]^-1\right)$. We note that $Θ\left(\Delta_[2]^-2\ln\ln\Delta_[2]^-1\right)$ is necessary and sufficient to solve the two-arm instance with the best and second best arms. If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best-$1$-Arm (up to constant factors). In this paper, we make significant progress towards a complete resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires \[O\left(H(I)⋅\left(\lnδ^-1 + \mathsf{Ent}(I)\right)+\Delta_[2]^-2\ln\ln\Delta_[2]^-1\mathrmpolylog(n,δ^-1)\right)\]samples in expectation for

🧭 Keyword Pioneer — gap-entropy conjecture

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lijie Chen , Jian Li , Mingda Qiao

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Stochastic Processes

Keywords

sample complexity best arm identification stochastic bandit gap-entropy conjecture instance optimal

Download PDF

Related papers

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization 2017

Open Problem: First-Order Regret Bounds for Contextual Bandits 2017

Open Problem: Meeting Times for Learning Random Automata 2017

Corralling a Band of Bandit Algorithms 2017

Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons 2017