USHER: Unbiased Sampling for Hindsight Experience Replay

Liam Schramm; Yunfu Deng; Edgar Granados; Abdeslam Boularias

2022 CORL CoRL 2022

USHER: Unbiased Sampling for Hindsight Experience Replay

Abstract

Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Liam Schramm , Yunfu Deng , Edgar Granados , Abdeslam Boularias

Topics

Machine Learning > Optimization & Theory > Stochastic Processes Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning

Keywords

importance sampling value function sparse reward hindsight experience replay stochastic environment

Download PDF

Related papers

One-Shot Transfer of Affordance Regions? AffCorrs! 2022

RoboTube: Learning Household Manipulation from Human Videos with Simulated Twin Environments 2022

Training Robots to Evaluate Robots: Example-Based Interactive Reward Functions for Policy Learning 2022

Watch and Match: Supercharging Imitation with Regularized Optimal Transport 2022

Offline Reinforcement Learning for Visual Navigation 2022