Trust Region Policy Optimization

John Schulman; Sergey Levine; Pieter Abbeel; Michael Jordan; Philipp Moritz

2015 ICML ICML 2015

Trust Region Policy Optimization

Abstract

In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified scheme, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

📈 Trend Setter — Game AI

🧭 Keyword Pioneer — trust region policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Interdisciplinary, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — reinforcement learning