Generalizable Imitation Learning from Observation via Inferring Goal Proximity

Youngwoon Lee; Andrew Szot; Shao-Hua Sun; Joseph J. Lim

2021 NIPS NeurIPS 2021

Generalizable Imitation Learning from Observation via Inferring Goal Proximity

Abstract

Task progress is intuitive and readily available task information that can guide an agent closer to the desired goal. Furthermore, a task progress estimator can generalize to new situations. From this intuition, we propose a simple yet effective imitation learning from observation method for a goal-directed task using a learned goal proximity function as a task progress estimator for better generalization to unseen states and goals. We obtain this goal proximity function from expert demonstrations and online agent experience, and then use the learned goal proximity as a dense reward for policy training. We demonstrate that our proposed method can robustly generalize compared to prior imitation learning methods on a set of goal-directed tasks in navigation, locomotion, and robotic manipulation, even with demonstrations that cover only a part of the states.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — dense reward

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Youngwoon Lee , Andrew Szot , Shao-Hua Sun , Joseph J. Lim

Topics

Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics Artificial Intelligence > Core AI > Reinforcement Learning Deep Learning > Learning Types > Imitation Learning Machine Learning > Learning Paradigms > Imitation Learning

Keywords

imitation learning policy optimization reward function policy training goal inference dense reward task progress goal proximity

Download PDF

Related papers

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data 2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 2021

Test-Time Personalization with a Transformer for Human Pose Estimation 2021

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations 2021

Scalable Intervention Target Estimation in Linear Models 2021