Model-Free Trajectory Optimization for Reinforcement Learning

Riad Akrour; Gerhard Neumann; Hany Abdulsamad; Abbas Abdolmaleki

2016 ICML ICML 2016

Model-Free Trajectory Optimization for Reinforcement Learning

Abstract

Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

📈 Trend Setter — Value Iteration

🧭 Keyword Pioneer — policy update

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Riad Akrour , Gerhard Neumann , Hany Abdulsamad , Abbas Abdolmaleki

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Value Iteration Machine Learning > Learning Types > Reinforcement Learning

Keywords

trajectory optimization model-free reinforcement learning policy update model-free algorithm q-function approximation kl-divergence constraint

Download PDF

Related papers

Associative Long Short-Term Memory 2016

Recycling Randomness with Structure for Sublinear time Kernel Expansions 2016

Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues 2016

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization 2016

Hawkes Processes with Stochastic Excitations 2016