Learning Continuous Control Policies by Stochastic Value Gradients

Nicolas Heess; Gregory Wayne; David Silver; Timothy Lillicrap; Tom Erez; Yuval Tassa

2015 NIPS NeurIPS 2015

Learning Continuous Control Policies by Stochastic Value Gradients

Abstract

We present a unified framework for learning continuous control policies usingbackpropagation. It supports stochastic control by treating stochasticity in theBellman equation as a deterministic function of exogenous noise. The productis a spectrum of general policy gradient algorithms that range from model-freemethods with value functions to model-based methods without value functions.We use learned models but only require observations from the environment insteadof observations from model-predicted trajectories, minimizing the impactof compounded model errors. We apply these algorithms first to a toy stochasticcontrol problem and then to several physics-based control problems in simulation.One of these variants, SVG(1), shows the effectiveness of learning models, valuefunctions, and policies simultaneously in continuous domains.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Reinforcement Learning

📈 Trend Setter — Reinforcement Learning

🧭 Keyword Pioneer — stochastic value gradient

🐣 Hot Topic Early Bird — deep reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nicolas Heess , Gregory Wayne , David Silver , Timothy Lillicrap , Tom Erez , Yuval Tassa

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics Deep Learning > Learning Types > Reinforcement Learning

Keywords

deep reinforcement learning policy gradient continuous control model-based reinforcement learning stochastic value gradient

Download PDF

Related papers

Data Generation as Sequential Decision Making 2015

A Recurrent Latent Variable Model for Sequential Data 2015

Combinatorial Cascading Bandits 2015

Accelerated Mirror Descent in Continuous and Discrete Time 2015

Matrix Completion with Noisy Side Information 2015