LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Geon-hyeong Kim; Jongmin Lee; Youngsoo Jang; Hongseok Yang; Kee-eung Kim

2022 NIPS NeurIPS 2022

LobsDICE: Offline Learning from Observation via Stationary Distribution Correction Estimation

Abstract

We consider the problem of learning from observation (LfO), in which the agent aims to mimic the expert's behavior from the state-only demonstrations by experts. We additionally assume that the agent cannot interact with the environment but has access to the action-labeled transition data collected by some agents with unknown qualities. This offline setting for LfO is appealing in many real-world scenarios where the ground-truth expert actions are inaccessible and the arbitrary environment interactions are costly or risky. In this paper, we present LobsDICE, an offline LfO algorithm that learns to imitate the expert policy via optimization in the space of stationary distributions. Our algorithm solves a single convex minimization problem, which minimizes the divergence between the two state-transition distributions induced by the expert and the agent policy. Through an extensive set of offline LfO tasks, we show that LobsDICE outperforms strong baseline methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Geon-hyeong Kim , Jongmin Lee , Youngsoo Jang , Hongseok Yang , Kee-eung Kim

Topics

Machine Learning > Learning Types > Weakly Supervised Learning Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Domain Adaptation Reinforcement Learning > Methods > Offline RL Machine Learning > Learning Types > Imitation Learning Artificial Intelligence > Core AI > Decision Making Deep Learning > Learning Types > Imitation Learning Machine Learning > Learning Paradigms > Imitation Learning

Keywords

reinforcement learning offline reinforcement learning imitation learning divergence minimization policy learning offline learning stationary distribution learning from observation distribution correction

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022