Meta-Model-Based Meta-Policy Optimization

Takuya Hiraoka; Takahisa Imagawa; Voot Tangkaratt; Takayuki Osa; Takashi Onishi; Yoshimasa Tsuruoka

2021 ACML ACML 2021

Meta-Model-Based Meta-Policy Optimization

Abstract

Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarantee of model-based meta-RL methods by extending the theorems proposed by Janner et al. (2019). On the basis of our theoretical results, we propose Meta-Model-Based Meta-Policy Optimization (M3PO), a model-based meta-RL method with a performance guarantee. We demonstrate that M3PO outperforms existing meta-RL methods in continuous-control benchmarks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Takuya Hiraoka , Takahisa Imagawa , Voot Tangkaratt , Takayuki Osa , Takashi Onishi , Yoshimasa Tsuruoka

Topics

Artificial Intelligence > Learning Paradigms > Meta-Learning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning

Keywords

multi-task learning policy optimization continuous control model-based reinforcement learning meta reinforcement learning

Download PDF

Related papers

Transfer Learning with Adaptive Online TrAdaBoost for Data Streams 2021

$h$-DBSCAN: A simple fast DBSCAN algorithm for big data 2021

Iterative Deep Model Compression and Acceleration in the Frequency Domain 2021

Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations 2021

Contrastive Neural Processes for Self-Supervised Learning 2021