Refining the Confidence Level for Optimistic Bandit Strategies

Tor Lattimore

2018 JMLR JMLR 2018

Refining the Confidence Level for Optimistic Bandit Strategies

Abstract

This paper introduces the first strategy for stochastic bandits with unit variance Gaussian noise that is simultaneously minimax optimal up to constant factors, asymptotically optimal, and never worse than the classical upper confidence bound strategy up to universal constant factors. Preliminary empirical evidence is also promising. Besides this, a conjecture on the optimal form of the regret is shown to be false and a finite-time lower bound on the regret of any strategy is presented that very nearly matches the finite-time upper bound of the newly proposed strategy. [abs] [ pdf ][ bib ] © JMLR 2018. (edit, beta)

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — asymptotic optimal

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics, Security & Privacy

🐣 Hot Topic Early Bird — regret minimization

Authors

Tor Lattimore

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Theory Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

minimax optimality regret minimization minimax optimal asymptotic optimality multi-armed bandit upper confidence bound regret bound stochastic bandit asymptotic optimal

Download PDF

Related papers

Simple Classification Using Binary Data 2018

ELFI: Engine for Likelihood-Free Inference 2018

An Efficient and Effective Generic Agglomerative Hierarchical Clustering Approach 2018

Convergence of Unregularized Online Learning Algorithms 2018

Sparse Concordance-assisted Learning for Optimal Treatment Decision 2018