Recurrent World Models Facilitate Policy Evolution

David Ha; Jürgen Schmidhuber

2018 NIPS NeurIPS 2018

Recurrent World Models Facilitate Policy Evolution

Abstract

A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model's extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an environment generated by its own internal world model, and transfer this policy back into the actual environment. Interactive version of this paper is available at https://worldmodels.github.io

🌱 Topic Pioneer — Evolutionary Algorithm

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Mathematics & Optimization and Reinforcement Learning

📈 Trend Setter — Evolutionary Algorithm

🧭 Keyword Pioneer — policy evolution

🐣 Hot Topic Early Bird — world model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

David Ha , Jürgen Schmidhuber

Topics

Machine Learning > Core Methods > Representation Learning Deep Learning > Models > Generative Models Reinforcement Learning > Methods > Deep RL Mathematics & Optimization > Optimization > Evolutionary Algorithm Machine Learning > Learning Types > Evolutionary Algorithm

Keywords

unsupervised learning generative model recurrent neural network world model evolution strategy policy evolution

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018

Trajectory Convolution for Action Recognition 2018