Policy Optimization with Demonstrations

Bingyi Kang; Zequn Jie; Jiashi Feng

2018 ICML ICML 2018

Policy Optimization with Demonstrations

Abstract

Exploration remains a significant challenge to reinforcement learning methods, especially in environments where reward signals are sparse. Recent methods of learning from demonstrations have shown to be promising in overcoming exploration difficulties but typically require considerable high-quality demonstrations that are difficult to collect. We propose to effectively leverage available demonstrations to guide exploration through enforcing occupancy measure matching between the learned policy and current demonstrations, and develop a novel Policy Optimization from Demonstration (POfD) method. We show that POfD induces implicit dynamic reward shaping and brings provable benefits for policy improvement. Furthermore, it can be combined with policy gradient methods to produce state-of-the-art results, as demonstrated experimentally on a range of popular benchmark sparse-reward tasks, even when the demonstrations are few and imperfect.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — reward shaping

🐣 Hot Topic Early Bird — policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Bingyi Kang , Zequn Jie , Jiashi Feng

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Application Areas > Data Augmentation Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Imitation Learning

Keywords

reinforcement learning imitation learning policy optimization learning from demonstration sparse reward reward shaping occupancy measure matching occupancy measure

Download PDF

Related papers

Rectify Heterogeneous Models with Semantic Mapping 2018

Bayesian Optimization of Combinatorial Structures 2018

The Well-Tempered Lasso 2018

Approximation Algorithms for Cascading Prediction Models 2018

Classification from Pairwise Similarity and Unlabeled Data 2018