A Theory of Regularized Markov Decision Processes

Matthieu Geist; Bruno Scherrer; Olivier Pietquin

2019 ICML ICML 2019

A Theory of Regularized Markov Decision Processes

Abstract

Many recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) classical algorithms such as Trust Region Policy Optimization, Soft Q-learning, Stochastic Actor Critic or Dynamic Policy Programming are special cases. This also draws connections to proximal convex optimization, especially to Mirror Descent.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐣 Hot Topic Early Bird — policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Data Science & Analytics, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Matthieu Geist , Bruno Scherrer , Olivier Pietquin

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Theory Reinforcement Learning > Methods > Policy Learning Mathematics & Optimization > Optimization > Optimization

Keywords

policy optimization convex optimization markov decision process value iteration kullback-leibler divergence policy iteration mirror descent bellman operator entropy regularization

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019