Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

Yunhao Tang; Alp Kucukelbir

2021 AISTATS AISTATS 2021

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

Abstract

We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how ’learning in hindsight’ techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images. We show that the combined algorithm, hEM significantly outperforms model-free baselines on a wide range of goal-conditioned benchmarks with sparse rewards.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — hindsight replay

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yunhao Tang , Alp Kucukelbir

Topics

Machine Learning > Learning Types > Self-Supervised Learning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics Machine Learning > Learning Types > Multi-Task Learning

Keywords

policy optimization expectation maximization sparse reward goal-conditioned reinforcement learning hindsight experience replay hindsight replay

Download PDF

Related papers

Linear Regression Games: Convergence Guarantees to Approximate Out-of-Distribution Solutions 2021

Semi-Supervised Learning with Meta-Gradient 2021

Accelerating Metropolis-Hastings with Lightweight Inference Compilation 2021

When MAML Can Adapt Fast and How to Assist When It Cannot 2021

On the convergence of the Metropolis algorithm with fixed-order updates for multivariate binary probability distributions 2021