Learning from a Learner

Alexis Jacq; Matthieu Geist; Ana Paiva; Olivier Pietquin

2019 ICML ICML 2019

Learning from a Learner

Abstract

In this paper, we propose a novel setting for Inverse Reinforcement Learning (IRL), namely "Learning from a Learner" (LfL). As opposed to standard IRL, it does not consist in learning a reward by observing an optimal agent but from observations of another learning (and thus sub-optimal) agent. To do so, we leverage the fact that the observed agent’s policy is assumed to improve over time. The ultimate goal of this approach is to recover the actual environment’s reward and to allow the observer to outperform the learner. To recover that reward in practice, we propose methods based on the entropy-regularized policy iteration framework. We discuss different approaches to learn solely from trajectories in the state-action space. We demonstrate the genericity of our method by observing agents implementing various reinforcement learning algorithms. Finally, we show that, on both discrete and continuous state/action tasks, the observer’s performance (that optimizes the recovered reward) can surpass those of the observed agent.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — sub-optimal agent

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — entropy regularization

Authors

Alexis Jacq , Matthieu Geist , Ana Paiva , Olivier Pietquin

Topics

Machine Learning > Learning Types > Self-Supervised Learning Reinforcement Learning > Methods > Policy Learning

Keywords

inverse reinforcement learning policy iteration reward function entropy regularization sub-optimal agent

Download PDF

Related papers

Bayesian leave-one-out cross-validation for large data 2019

A Block Coordinate Descent Proximal Method for Simultaneous Filtering and Parameter Estimation 2019

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks 2019

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously 2019

Improved Convergence for $\ell_1$ and $\ell_∞$ Regression via Iteratively Reweighted Least Squares 2019