2019
NIPS
NeurIPS 2019
Thresholding Bandit with Optimal Aggregate Regret
Abstract
We consider the thresholding bandit problem, whose goal is to find arms of mean rewards above a given threshold $\theta$, with a fixed budget of $T$ trials. We introduce LSA, a new, simple and anytime algorithm that aims to minimize the aggregate regret (or the expected number of mis-classified arms). We prove that our algorithm is instance-wise asymptotically optimal. We also provide comprehensive empirical results to demonstrate the algorithm's superior performance over existing algorithms under a variety of different scenarios.
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Machine Learning > Learning Types > Active Learning
Machine Learning > Optimization & Theory > Learning Theory
Machine Learning > Learning Types > Online Learning
Machine Learning > Optimization & Theory > Online Algorithms
Machine Learning > Learning Types > Multi-Armed Bandits
Machine Learning > Learning Types > Exploration-Exploitation