Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information

Peter Auer; Yifang Chen; Pratik Gajane; Chung-Wei Lee; Haipeng Luo; Ronald Ortner; Chen-Yu Wei

2019 COLT COLT 2019

Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information

Abstract

This joint extended abstract introduces and compares the results of (Auer et al., 2019) and (Chen et al., 2019), both of which resolve the problem of achieving optimal dynamic regret for non-stationary bandits without prior information on the non-stationarity. Specifically, Auer et al. (2019) resolve the problem for the traditional multi-armed bandits setting, while Chen et al. (2019) give a solution for the more general contextual bandits setting. Both works extend the key idea of (Auer et al., 2018) developed for a simpler two-armed setting.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Peter Auer , Yifang Chen , Pratik Gajane , Chung-Wei Lee , Haipeng Luo , Ronald Ortner , Chen-Yu Wei

Topics

Machine Learning > Optimization & Theory > Learning Theory Mathematics & Optimization > Optimization > Online Algorithms

Keywords

dynamic regret minimax regret multi-armed bandit contextual bandit non-stationary bandit

Download PDF

Related papers

Inference under Information Constraints: Lower Bounds from Chi-Square Contraction 2019

Learning in Non-convex Games with an Optimization Oracle 2019

Learning to Prune: Speeding up Repeated Computations 2019

A Universal Algorithm for Variational Inequalities Adaptive to Smoothness and Noise 2019

Learning Two Layer Rectified Neural Networks in Polynomial Time 2019