Policy Continuation with Hindsight Inverse Dynamics

Hao Sun; Zhizhong Li; Xiaotong Liu; Bolei Zhou; Dahua Lin

2019 NIPS NeurIPS 2019

Policy Continuation with Hindsight Inverse Dynamics

Abstract

Solving goal-oriented tasks is an important but challenging problem in reinforcement learning (RL). For such tasks, the rewards are often sparse, making it difficult to learn a policy effectively. To tackle this difficulty, we propose a new approach called Policy Continuation with Hindsight Inverse Dynamics (PCHID). This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay. Enabling the learning process in a self-imitated manner and thus can be trained with supervised learning. This work also extends it to multi-step settings with Policy Continuation. The proposed method is general, which can work in isolation or be combined with other on-policy and off-policy algorithms. On two multi-goal tasks GridWorld and FetchReach, PCHID significantly improves the sample efficiency as well as the final performance.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

📈 Trend Setter — Reinforcement Learning

🧭 Keyword Pioneer — policy continuation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hao Sun , Zhizhong Li , Xiaotong Liu , Bolei Zhou , Dahua Lin

Topics

Reinforcement Learning Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Multi-Agent Systems

Keywords

reinforcement learning supervised learning policy learning inverse dynamics sparse reward hindsight experience replay goal-oriented task policy continuation

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019