2017 COLT COLT 2017

Towards Instance Optimal Bounds for Best Arm Identification

Abstract

In the classical best arm identification (Best-$1$-Arm) problem, we are given $n$ stochastic bandit arms, each associated with a reward distribution with an unknown mean. Upon each play of an arm, we can get a reward sampled i.i.d. from its reward distribution. We would like to identify the arm with the largest mean with probability at least $1-δ$, using as few samples as possible. The problem has a long history and understanding its sample complexity has attracted significant attention since the last decade. However, the optimal sample complexity of the problem is still unknown. Recently, Chen and Li (2016) made an interesting conjecture, called gap-entropy conjecture, concerning the instance optimal sample complexity of Best-$1$-Arm. Given a Best-$1$-Arm instance $I$ (i.e., a set of arms), let $\mu_[i]$ denote the $i$th largest mean and $\Delta_[i]=\mu_[1]-\mu_[i]$ denote the corresponding gap. $H(I)=\sum_i=2^n\Delta_[i]^-2$ denotes the complexity of the instance. The gap-entropy conjecture states that for any instance $I$, $Ω\left(H(I)⋅\left(\lnδ^-1 + \mathsf{Ent}(I)\right)\right)$ is an instance lower bound, where $\mathsf{Ent}(I)$ is an entropy-like term determined by the gaps, and there is a $δ$-correct algorithm for Best-$1$-Arm with sample complexity $O\left(H(I)⋅\left(\lnδ^-1 + \mathsf{Ent}(I)\right)+\Delta_[2]^-2\ln\ln\Delta_[2]^-1\right)$. We note that $Θ\left(\Delta_[2]^-2\ln\ln\Delta_[2]^-1\right)$ is necessary and sufficient to solve the two-arm instance with the best and second best arms. If the conjecture is true, we would have a complete understanding of the instance-wise sample complexity of Best-$1$-Arm (up to constant factors). In this paper, we make significant progress towards a complete resolution of the gap-entropy conjecture. For the upper bound, we provide a highly nontrivial algorithm which requires \[O\left(H(I)⋅\left(\lnδ^-1 + \mathsf{Ent}(I)\right)+\Delta_[2]^-2\ln\ln\Delta_[2]^-1\mathrmpolylog(n,δ^-1)\right)\]samples in expectation for

🧭 Keyword Pioneer — gap-entropy conjecture
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio