2017 COLT COLT 2017

An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits

Abstract

We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from $(\ln t)^3$ to $(\ln t)^2$ and eliminates an additive factor of order $∆e^1/∆^2$, where $∆$ is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.

🧭 Keyword Pioneer — exp3++ algorithm
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy