Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach

Chen-Yu Wei; Haipeng Luo

2021 COLT COLT 2021

Non-stationary Reinforcement Learning without Prior Knowledge: an Optimal Black-box Approach

Abstract

We propose a black-box reduction that turns a certain reinforcement learning algorithm with optimal regret in a (near-)stationary environment into another algorithm with optimal dynamic regret in a non-stationary environment, importantly without any prior knowledge on the degree of non-stationarity. By plugging different algorithms into our black-box, we provide a list of examples showing that our approach not only recovers recent results for (contextual) multi-armed bandits achieved by very specialized algorithms, but also significantly improves the state of the art for (generalzed) linear bandits, episodic MDPs, and infinite-horizon MDPs in various ways. Specifically, in most cases our algorithm achieves the optimal dynamic regret $\widetilde{\mathcal{O}}(\min\{\sqrt{LT}, \Delta^{\frac{1}{3}}T^{\frac{2}{3}}\})$ where $T$ is the number of rounds and $L$ and $\Delta$ are the number and amount of changes of the world respectively, while previous works only obtain suboptimal bounds and/or require the knowledge of $L$ and $\Delta$.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Chen-Yu Wei , Haipeng Luo

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL

Keywords

dynamic regret multi-armed bandit linear bandit non-stationary environment black-box reduction

Download PDF

Related papers

SGD Generalizes Better Than GD (And Regularization Doesn’t Help) 2021

Learning in Matrix Games can be Arbitrarily Complex 2021

Reconstructing weighted voting schemes from partial information about their power indices 2021

Online Learning from Optimal Actions 2021

Robust learning under clean-label attack 2021