Teaching Inverse Reinforcement Learners via Features and Demonstrations

Luis Haug; Sebastian Tschiatschek; Adish Singla

2018 NIPS NeurIPS 2018

Teaching Inverse Reinforcement Learners via Features and Demonstrations

Abstract

Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption that the learner knows the features that the true reward function depends on. In this paper, we study the problem of learning from demonstrations in the setting where this is not the case, i.e., where there is a mismatch between the worldviews of the learner and the expert. We introduce a natural quantity, the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms based on inverse reinforcement learning. Based on these findings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimal policy.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — teaching risk

🐣 Hot Topic Early Bird — inverse reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Luis Haug , Sebastian Tschiatschek , Adish Singla

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Imitation Learning Artificial Intelligence > Core AI > Reinforcement Learning Machine Learning > Learning Types > Inverse Reinforcement Learning

Keywords

feature learning imitation learning policy learning inverse reinforcement learning reward function expert demonstration teaching risk feature mismatch

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Recurrent World Models Facilitate Policy Evolution 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018