Model-Based Reinforcement Learning via Meta-Policy Optimization

Ignasi Clavera; Jonas Rothfuss; John Schulman; Yasuhiro Fujita; Tamim Asfour; Pieter Abbeel

2018 CORL CoRL 2018

Model-Based Reinforcement Learning via Meta-Policy Optimization

Abstract

Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic performance as model-free methods. We propose Model-Based Meta-Policy-Optimization (MB-MPO), an approach that foregoes the strong reliance on accurate learned dynamics models. Using an ensemble of learned dynamic models, MB-MPO meta-learns a policy that can quickly adapt to any model in the ensemble with one policy gradient step. This steers the meta-policy towards internalizing consistent dynamics predictions among the ensemble while shifting the burden of behaving optimally w.r.t. the model discrepancies towards the adaptation step. Our experiments show that MB-MPO is more robust to model imperfections than previous model-based approaches. Finally, we demonstrate that our approach is able to match the asymptotic performance of model-free methods while requiring significantly less experience.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — meta-policy optimization

🐣 Hot Topic Early Bird — model-based reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ignasi Clavera , Jonas Rothfuss , John Schulman , Yasuhiro Fujita , Tamim Asfour , Pieter Abbeel

Topics

Artificial Intelligence > Learning Paradigms > Meta-Learning Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Paradigms > Meta-Learning Artificial Intelligence > Core AI > Robotics

Keywords

deep reinforcement learning policy gradient model-based reinforcement learning model ensemble dynamics model neural network meta-policy optimization

Download PDF

Related papers

Batch Active Preference-Based Learning of Reward Functions 2018

Personalized Dynamics Models for Adaptive Assistive Navigation Systems 2018

Neural Modular Control for Embodied Question Answering 2018

Guided Feature Transformation (GFT): A Neural Language Grounding Module for Embodied Agents 2018

Deep Drone Racing: Learning Agile Flight in Dynamic Environments 2018