Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

Xiaofei Wang; Kimin Lee; Kourosh Hakhamaneshi; Pieter Abbeel; Michael Laskin

2021 CORL CoRL 2021

Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback

Abstract

A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations. However, such generative models inherit the biases of the underlying data and result in poor and unusable skills when trained on imperfect demonstration data. To better align skill extraction with human intent we present Skill Preferences (SkiP), an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data. After extracting human-preferred skills, SkiP also utilizes human feedback to solve downstream tasks with RL. We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks and substantially outperforms prior leading RL algorithms with human preferences as well as leading skill extraction algorithms without human preferences.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐣 Hot Topic Early Bird — human feedback

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xiaofei Wang , Kimin Lee , Kourosh Hakhamaneshi , Pieter Abbeel , Michael Laskin

Topics

Artificial Intelligence > Core AI > Agent Systems Machine Learning > Learning Types > Reinforcement Learning

Keywords

reinforcement learning preference learning human feedback generative model skill extraction

Download PDF

Related papers

FlingBot: The Unreasonable Effectiveness of Dynamic Manipulation for Cloth Unfolding 2021

TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo 2021

Taskography: Evaluating robot task planning over large 3D scene graphs 2021

Parallelised Diffeomorphic Sampling-based Motion Planning 2021

Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning 2021