An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

Yevgeny Seldin; Gabor Lugosi

2017 COLT COLT 2017

An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

Abstract

We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(\ln t)^3$ to $(\ln t)^2$ and eliminates an additive factor of order $∆e^1/∆^2$, where $∆$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.

🧭 Keyword Pioneer — exp3++ algorithm

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Yevgeny Seldin , Gabor Lugosi

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Stochastic Methods

Keywords

multi-armed bandit regret bound stochastic bandit adversarial bandit exp3++ algorithm

Download PDF

Related papers

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization 2017

Open Problem: First-Order Regret Bounds for Contextual Bandits 2017

Open Problem: Meeting Times for Learning Random Automata 2017

Corralling a Band of Bandit Algorithms 2017

Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons 2017