2018
JMLR
JMLR 2018
Refining the Confidence Level for Optimistic Bandit Strategies
Abstract
This paper introduces the first strategy for stochastic bandits with unit variance Gaussian noise that is simultaneously minimax optimal up to constant factors, asymptotically optimal, and never worse than the classical upper confidence bound strategy up to universal constant factors. Preliminary empirical evidence is also promising. Besides this, a conjecture on the optimal form of the regret is shown to be false and a finite-time lower bound on the regret of any strategy is presented that very nearly matches the finite-time upper bound of the newly proposed strategy. [abs] [ pdf ][ bib ] © JMLR 2018. (edit, beta)
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Mathematics & Optimization
🧭
Keyword Pioneer
— asymptotic optimal
🐝
Cross-Pollinator
— Artificial Intelligence, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Security & Privacy
🐣
Hot Topic Early Bird
— regret minimization