Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

Zeyuan Allen-Zhu; Sébastien Bubeck; Yuanzhi Li

2018 ICML ICML 2018

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits

Abstract

Regret bounds in online learning compare the player’s performance to $L*$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $T$. The more refined concept of first-order regret bound replaces this with a scaling $\sqrt{L*}$, which may be much smaller than $\sqrt{T}$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — first-order regret

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zeyuan Allen-Zhu , Sébastien Bubeck , Yuanzhi Li

Topics

Artificial Intelligence > Core AI > Multi-Agent Systems Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Optimization & Theory > Learning Theory Machine Learning > Learning Types > Online Learning Machine Learning > Learning Types > Multi-Armed Bandits

Keywords

online learning regret bound contextual bandit first-order regret policy space

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018