Faster Policy Learning with Continuous-Time Gradients

Samuel Ainsworth; Kendall Lowrey; John Thickstun; Zaïd Harchaoui; Siddhartha Srinivasa

2021 L4DC L4DC 2021

Faster Policy Learning with Continuous-Time Gradients

Abstract

We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate continuous-time gradients in the original system. With the explicit goal of estimating continuous-time gradients, we are able to discretize adaptively and construct a more efficient policy gradient estimator which we call the Continuous-Time Policy Gradient (CTPG). We show that replacing BPTT policy gradients with more efficient CTPG estimates results in faster and more robust learning in a variety of control tasks and simulators.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — continuous-time gradient

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Samuel Ainsworth , Kendall Lowrey , John Thickstun , Zaïd Harchaoui , Siddhartha Srinivasa

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Deep Learning > Optimization & Theory > Optimization

Keywords

policy gradient policy learning adaptive discretization continuous-time system continuous-time gradient back-propagation through time

Download PDF

Related papers

Abstraction-based branch and bound approach to Q-learning for hybrid optimal control 2021

Data-driven design of switching reference governors for brake-by-wire applications 2021

Learning local modules in dynamic networks 2021

Certainty Equivalent Perception-Based Control 2021

Sample Complexity of Linear Quadratic Gaussian (LQG) Control for Output Feedback Systems 2021