Linearly-solvable Markov decision problems

Emanuel Todorov

2006 NIPS NeurIPS 2006

Linearly-solvable Markov decision problems

Abstract

We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS0524761.

🚀 Conference Pioneer — NIPS 2006

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

📈 Trend Setter — Planning

🧭 Keyword Pioneer — bellman equation

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

🌱 Topic Pioneer — Optimal Control

Authors

Emanuel Todorov

Topics

Artificial Intelligence > Core AI > Planning Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Mathematics & Optimization > Optimization > Optimal Control

Keywords

reinforcement learning policy optimization markov decision processes convex optimization markov decision process optimal control bellman equation value function linearly-solvable mdps

Download PDF

Related papers

Temporal Coding using the Response Properties of Spiking Neurons 2006

Parameter Expanded Variational Bayesian Methods 2006

Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning 2006

Ordinal Regression by Extended Binary Classification 2006

Blind source separation for over-determined delayed mixtures 2006