Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

Chenyang Zhao; Timothy Hospedales

2021 ACML ACML 2021

Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

Abstract

In reinforcement learning, domain randomisation is a popular technique for learning general policies that are robust to new environments and domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variances in gradient estimation and sub-optimal policies. To address this issue, we present a peer-to-peer online distillation strategy for reinforcement learning termed P2PDRL, where multiple learning agents are each assigned to a different environment, and then exchange knowledge through mutual regularisation based on Kullback–Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation performance to new environments at testing.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — domain randomisation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chenyang Zhao , Timothy Hospedales

Topics

Machine Learning > Application Areas > Knowledge Distillation Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics

Keywords

reinforcement learning policy learning domain randomisation peer-to-peer distillation generalisation performance

Download PDF

Related papers

Transfer Learning with Adaptive Online TrAdaBoost for Data Streams 2021

$h$-DBSCAN: A simple fast DBSCAN algorithm for big data 2021

Iterative Deep Model Compression and Acceleration in the Frequency Domain 2021

Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations 2021

Contrastive Neural Processes for Self-Supervised Learning 2021