Inverse Reinforcement Learning through Structured Classification

Edouard Klein; Matthieu Geist; Bilal Piot; Olivier Pietquin

2012 NIPS NeurIPS 2012

Inverse Reinforcement Learning through Structured Classification

Abstract

This paper adresses the inverse reinforcement learning (IRL) problem, that is inferring a reward for which a demonstrated expert behavior is optimal. We introduce a new algorithm, SCIRL, whose principle is to use the so-called feature expectation of the expert as the parameterization of the score function of a multi-class classifier. This approach produces a reward function for which the expert policy is provably near-optimal. Contrary to most of existing IRL algorithms, SCIRL does not require solving the direct RL problem. Moreover, with an appropriate heuristic, it can succeed with only trajectories sampled according to the expert behavior. This is illustrated on a car driving simulator.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

📈 Trend Setter — Agent Systems

🧭 Keyword Pioneer — structured classification

🐣 Hot Topic Early Bird — policy optimization

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Edouard Klein , Matthieu Geist , Bilal Piot , Olivier Pietquin

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Multi-Agent Systems Machine Learning > Core Methods > Classification Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Imitation Learning Artificial Intelligence > Core AI > Reinforcement Learning

Keywords

imitation learning policy optimization inverse reinforcement learning multi-class classification apprenticeship learning reward function feature expectations reward learning structured classification expert policy multi-class classifier

Download PDF

Related papers

Kernel Hyperalignment 2012

Fused sparsity and robust estimation for linear models with unknown variance 2012

Slice sampling normalized kernel-weighted completely random measure mixture models 2012

Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization 2012

Matrix reconstruction with the local max norm 2012