Variational Regret Bounds for Reinforcement Learning

Ronald Ortner; Pratik Gajane; Peter Auer

2019 UAI UAI 2019

Variational Regret Bounds for Reinforcement Learning

Abstract

We consider undiscounted reinforcement learning in Markov decision processes (MDPs) where \textit{both} the reward functions and the state-transition probabilities may vary (gradually or abruptly) over time. For this problem setting, we propose an algorithm and provide performance guarantees for the regret evaluated against the optimal non-stationary policy. The upper bound on the regret is given in terms of the total variation in the MDP. This is the first variational regret bound for the general reinforcement learning setting.

🚀 Conference Pioneer — UAI 2019

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — variational regret

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Ronald Ortner , Pratik Gajane , Peter Auer

Topics

Machine Learning > Optimization & Theory > Learning Theory Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Robotics

Keywords

reinforcement learning markov decision process regret bound variational regret non-stationary mdp

Download PDF

Related papers

Fisher-Bures Adversary Graph Convolutional Networks 2019

Augmenting and Tuning Knowledge Graph Embeddings 2019

Learning Factored Markov Decision Processes with Unawareness 2019

Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions 2019

Countdown Regression: Sharp and Calibrated Survival Predictions 2019