Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill; Omar Darwiche Domingues; Pierre Menard; Rémi Munos; Michal Valko

2019 NIPS NeurIPS 2019

Planning in entropy-regularized Markov decision processes and games

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🐣 Hot Topic Early Bird — value iteration

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jean-Bastien Grill , Omar Darwiche Domingues , Pierre Menard , Rémi Munos , Michal Valko

Topics

Reinforcement Learning > Applications > Value Iteration Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Game Theory Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

game theory sample complexity markov decision process value iteration bellman operator entropy regularization planning algorithm two-player game

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019