BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Xinyue Chen; Zijian Zhou; Zheng Wang; Che Wang; Yanqiu Wu; Keith Ross

2020 NIPS NeurIPS 2020

BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning

Abstract

There has recently been a surge in research in batch Deep Reinforcement Learning (DRL), which aims for learning a high-performing policy from a given dataset without additional interactions with the environment. We propose a new algorithm, Best-Action Imitation Learning (BAIL), which strives for both simplicity and performance. BAIL learns a V function, uses the V function to select actions it believes to be high-performing, and then uses those actions to train a policy network using imitation learning. For the MuJoCo benchmark, we provide a comprehensive experimental study of BAIL, comparing its performance to four other batch Q-learning and imitation-learning schemes for a large variety of batch datasets. Our experiments show that BAIL's performance is much higher than the other schemes, and is also computationally much faster than the batch Q-learning schemes.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xinyue Chen , Zijian Zhou , Zheng Wang , Che Wang , Yanqiu Wu , Keith Ross

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Offline RL Reinforcement Learning > Methods > Policy Learning

Keywords

deep reinforcement learning imitation learning policy learning value function batch reinforcement learning

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020