Perturbed-History Exploration in Stochastic Multi-Armed Bandits

Branislav Kveton; Csaba Szepesvári; Mohammad Ghavamzadeh; Craig Boutilier

2019 IJCAI IJCAI 2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits

Abstract

We propose an online algorithm for cumulative regret minimization in a stochastic multi-armed bandit. The algorithm adds O(t) i.i.d. pseudo-rewards to its history in round t and then pulls the arm with the highest average reward in its perturbed history. Therefore, we call it perturbed-history exploration (PHE). The pseudo-rewards are carefully designed to offset potentially underestimated mean rewards of arms with a high probability. We derive near-optimal gap-dependent and gap-free bounds on the n-round regret of PHE. The key step in our analysis is a novel argument that shows that randomized Bernoulli rewards lead to optimism. Finally, we empirically evaluate PHE and show that it is competitive with state-of-the-art baselines.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Branislav Kveton , Csaba Szepesvári , Mohammad Ghavamzadeh , Craig Boutilier

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Optimization & Theory > Learning Theory

Keywords

stochastic optimization regret minimization multi-armed bandit online algorithm exploration strategy

Download PDF

Related papers

Causal Embeddings for Recommendation: An Extended Abstract 2019

Pivotal Relationship Identification: The K-Truss Minimization Problem 2019

Portioning Using Ordinal Preferences: Fairness and Efficiency 2019

Probabilistic Strategy Logic 2019

Multi-Agent Pathfinding with Continuous Time 2019