Tight Bounds for Bandit Combinatorial Optimization

Alon Cohen; Tamir Hazan; Tomer Koren

2017 COLT COLT 2017

Tight Bounds for Bandit Combinatorial Optimization

Abstract

We revisit the study of optimal regret rates in bandit combinatorial optimization—a fundamental framework for sequential decision making under uncertainty that abstracts numerous combinatorial prediction problems. We prove that the attainable regret in this setting grows as $\widetildeΘ(k^3/2\sqrt{d}T)$ where $d$ is the dimension of the problem and $k$ is a bound over the maximal instantaneous loss, disproving a conjecture of Audibert, Bubeck, and Lugosi (2013) who argued that the optimal rate should be of the form $\widetildeΘ(k\sqrt{d}T)$. Our bounds apply to several important instances of the framework, and in particular, imply a tight bound for the well-studied bandit shortest path problem. By that, we also resolve an open problem posed by Cesa-Bianchi and Lugosi (2012).

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — bandit combinatorial optimization

🐣 Hot Topic Early Bird — sequential decision making

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Alon Cohen , Tamir Hazan , Tomer Koren

Topics

Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Optimization & Theory > Stochastic Processes Mathematics & Optimization > Optimization > Online Algorithms

Keywords

sequential decision making regret bound shortest path problem bandit combinatorial optimization

Download PDF

Related papers

Ignoring Is a Bliss: Learning with Large Noise Through Reweighting-Minimization 2017

Open Problem: First-Order Regret Bounds for Contextual Bandits 2017

Open Problem: Meeting Times for Learning Random Automata 2017

Corralling a Band of Bandit Algorithms 2017

Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons 2017