Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Yue Wu; Shuangfei Zhai; Nitish Srivastava; Joshua M Susskind; Jian Zhang; Ruslan Salakhutdinov; Hanlin Goh

2021 ICML ICML 2021

Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

Abstract

Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration. However, existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states. We hypothesize that a key missing ingredient from the existing methods is a proper treatment of uncertainty in the offline setting. We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly. Implementation-wise, we adopt a practical and effective dropout-based uncertainty estimation method that introduces very little overhead over existing RL algorithms. Empirically, we observe that UWAC substantially improves model stability during training. In addition, UWAC out-performs existing offline RL methods on a variety of competitive tasks, and achieves significant performance gains over the state-of-the-art baseline on datasets with sparse demonstrations collected from human experts.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yue Wu , Shuangfei Zhai , Nitish Srivastava , Joshua M Susskind , Jian Zhang , Ruslan Salakhutdinov , Hanlin Goh

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Uncertainty Quantification

Keywords

offline reinforcement learning uncertainty quantification uncertainty estimation

Download PDF

Related papers

GRAND: Graph Neural Diffusion 2021

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation 2021

Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution 2021

Dataset Dynamics via Gradient Flows in Probability Space 2021