Bandit Convex Optimization: \sqrtT Regret in One Dimension

Sébastien Bubeck; Ofer Dekel; Tomer Koren; Yuval Peres

2015 COLT COLT 2015

Bandit Convex Optimization: \sqrtT Regret in One Dimension

Abstract

We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐣 Hot Topic Early Bird — regret minimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Sébastien Bubeck , Ofer Dekel , Tomer Koren , Yuval Peres

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Online Algorithms Machine Learning > Optimization & Theory > Online Algorithms Mathematics & Optimization > Optimization > Game Theory Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

online learning regret minimization thompson sampling minimax regret regret bound bandit convex optimization

Download PDF

Related papers

Open Problem: Restricted Eigenvalue Condition for Heavy Tailed Designs 2015

Open Problem: The Oracle Complexity of Smooth Convex Optimization in Nonstandard Settings 2015

Online Learning with Feedback Graphs: Beyond Bandits 2015

Learning Overcomplete Latent Variable Models through Tensor Methods 2015

Efficient Learning of Linear Separators under Bounded Noise 2015