Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Dylan Foster; Alexander Rakhlin; David Simchi-Levi; Yunzong Xu

2021 COLT COLT 2021

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

Abstract

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While positive results are known for certain special cases, there is no general theory characterizing when and how instance-dependent regret bounds for contextual bandits can be achieved for rich, general classes of policies. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case. Finally, we provide structural results that tie together a number of complexity measures previously proposed throughout contextual bandits, reinforcement learning, and active learning and elucidate their role in determining the optimal instance-dependent regret. In a large-scale empirical evaluation, we find that our approach often gives superior results for challenging exploration problems. Turning our focus to reinforcement learning with function approximation, we develop new oracle-efficient algorithms for reinforcement learning with rich observations that obtain optimal gap-dependent sample complexity.

🧭 Keyword Pioneer — instance-dependent regret

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Dylan Foster , Alexander Rakhlin , David Simchi-Levi , Yunzong Xu

Topics

Reinforcement Learning > Methods > Deep RL

Keywords

function approximation contextual bandit instance-dependent regret gap-dependent complexity

Download PDF

Related papers

SGD Generalizes Better Than GD (And Regularization Doesn’t Help) 2021

Learning in Matrix Games can be Arbitrarily Complex 2021

Reconstructing weighted voting schemes from partial information about their power indices 2021

Online Learning from Optimal Actions 2021

Robust learning under clean-label attack 2021