Constrained Contrastive Reinforcement Learning

Haoyu Wang; Xinrui Yang; Yuhang Wang; Lan Xuguang

2022 ACML ACML 2022

Constrained Contrastive Reinforcement Learning

Abstract

Learning to control from complex observations remains a major challenge in the application of model-based reinforcement learning (MBRL). Existing MBRL methods apply contrastive learning to replace pixel-level reconstruction, improving the performance of the latent world model. However, previous contrastive learning approaches in MBRL fail to utilize task-relevant information, making it difficult to aggregate observations with the same task-relevant information but the different task-irrelevant information in latent space. In this work, we first propose Constrained Contrastive Reinforcement Learning (C2RL), an MBRL method that learns a world model through a combination of two contrastive losses based on latent dynamics and task-relevant state abstraction respectively, utilizing reward information to accelerate model learning. Then, we propose a hyperparameter $\beta$ to balance two kinds of contrastive losses to strengthen the representation ability of the latent dynamics. The experimental results show that our approach outperforms state-of-the-art methods in both the natural video and standard background setting on challenging DMControl tasks.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐣 Hot Topic Early Bird — world model

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haoyu Wang , Xinrui Yang , Yuhang Wang , Lan Xuguang

Topics

Machine Learning > Learning Types > Contrastive Learning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics

Keywords

contrastive learning state abstraction model-based reinforcement learning world model latent dynamics

Download PDF

Related papers

When to Classify Events in Open Times Series? 2022

Noisy Riemannian Gradient Descent for Eigenvalue Computation with Application to Inexact Stochastic Recursive Gradient Algorithm 2022

A Self-improving Skin Lesions Diagnosis Framework Via Pseudo-labeling and Self-distillation 2022

Towards Data-Free Domain Generalization 2022

SNAIL: Semi-Separated Uncertainty Adversarial Learning for Universal Domain Adaptation 2022