Lower Bounds and Selectivity of Weak-Consistent Policies in Stochastic Multi-Armed Bandit Problem

Antoine Salomon; Jean-yves Audibert; Issam El Alaoui

2013 JMLR JMLR 2013

Lower Bounds and Selectivity of Weak-Consistent Policies in Stochastic Multi-Armed Bandit Problem

Abstract

This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistency, and exhibit a generalisation of the bound. We also study the existence of logarithmic bounds in general and in the case of Hannan consistency. Moreover, we prove that it is impossible to design an adaptive policy that would select the best of two algorithms by taking advantage of the properties of the environment. To get these results, we study variants of popular Upper Confidence Bounds (UCB) policies. [abs] [ pdf ][ bib ] © JMLR 2013. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐣 Hot Topic Early Bird — stochastic process

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Antoine Salomon , Jean-yves Audibert , Issam El Alaoui

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Stochastic Processes Mathematics & Optimization > Optimization > Online Algorithms

Keywords

stochastic process policy selection multi-armed bandit upper confidence bound regret bound

Download PDF

Related papers

Parallel Vector Field Embedding 2013

Semi-Supervised Learning Using Greedy Max-Cut 2013

Random Spanning Trees and the Prediction of Weighted Graphs 2013

JKernelMachines: A Simple Framework for Kernel Machines 2013

Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems 2013