Policy gradients in linearly-solvable MDPs

Emanuel Todorov

2010 NIPS NeurIPS 2010

Policy gradients in linearly-solvable MDPs

Abstract

We present policy gradient results within the framework of linearly-solvable MDPs. For the first time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the first compatible function approximators and natural policy gradients for continuous-time stochastic systems.

🧭 Keyword Pioneer — linearly-solvable mdps

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Reinforcement Learning

📈 Trend Setter — Optimal Control

Authors

Emanuel Todorov

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Mathematics & Optimization > Optimization > Optimal Control

Keywords

reinforcement learning policy gradient function approximation natural policy gradient markov decision process linearly-solvable mdps continuous-time stochastic systems function approximator compatible function approximator

Download PDF

Related papers

Link Discovery using Graph Feature Tracking 2010

Trading off Mistakes and Don't-Know Predictions 2010

A Novel Kernel for Learning a Neuron Model from Spike Train Data 2010

Decomposing Isotonic Regression for Efficiently Solving Large Problems 2010

Learning Kernels with Radiuses of Minimum Enclosing Balls 2010