Evolved Policy Gradients

Rein Houthooft; Yuhua Chen; Phillip Isola; Bradly Stadie; Filip Wolski; OpenAI Jonathan Ho; Pieter Abbeel

2018 NIPS NeurIPS 2018

Evolved Policy Gradients

Abstract

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning. Empirical results show that our evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. We also demonstrate that EPG's learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Reinforcement Learning

🧭 Keyword Pioneer — gradient-based reinforcement learning

🐣 Hot Topic Early Bird — out-of-distribution generalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Rein Houthooft , Yuhua Chen , Phillip Isola , Bradly Stadie , Filip Wolski , OpenAI Jonathan Ho , Pieter Abbeel

Topics

Artificial Intelligence > Learning Paradigms > Meta-Learning Reinforcement Learning > Methods > Deep RL Artificial Intelligence > Core AI > Reinforcement Learning Deep Learning > Learning Types > Meta-Learning

Keywords

reinforcement learning policy gradient out-of-distribution generalization evolutionary algorithm temporal convolution differentiable loss gradient-based reinforcement learning differentiable loss function

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Recurrent World Models Facilitate Policy Evolution 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018