Exponentially Weighted Imitation Learning for Batched Historical Data

Qing Wang; Jiechao Xiong; Lei Han; peng sun; Han Liu; Tong Zhang

2018 NIPS NeurIPS 2018

Exponentially Weighted Imitation Learning for Batched Historical Data

Abstract

We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or ``environment oracle'' as in most reinforcement learning settings. To solve this problem, we propose a monotonic advantage reweighted imitation learning strategy that is applicable to problems with complex nonlinear function approximation and works well with hybrid (discrete and continuous) action space. The method does not rely on the knowledge of the behavior policy, thus can be used to learn from data generated by an unknown policy. Under mild conditions, our algorithm, though surprisingly simple, has a policy improvement bound and outperforms most competing methods empirically. Thorough numerical results are also provided to demonstrate the efficacy of the proposed methodology.

🌉 Interdisciplinary Bridge — Deep Learning and Reinforcement Learning

📈 Trend Setter — Offline RL

🧭 Keyword Pioneer — nonlinear function approximation

🐣 Hot Topic Early Bird — offline reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Qing Wang , Jiechao Xiong , Lei Han , peng sun , Han Liu , Tong Zhang

Topics

Reinforcement Learning > Methods > Offline RL Reinforcement Learning > Methods > Policy Learning Deep Learning > Learning Types > Imitation Learning

Keywords

offline reinforcement learning imitation learning policy learning batch learning policy improvement nonlinear function approximation advantage reweighting batch datum batched historical datum

Download PDF

Related papers

Maximum Causal Tsallis Entropy Imitation Learning 2018

Recurrent World Models Facilitate Policy Evolution 2018

Bandit Learning in Concave N-Person Games 2018

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors 2018