Provably Efficient Neural GTD for Off-Policy Learning

Hoi-To Wai; Zhuoran Yang; Zhaoran Wang; Mingyi Hong

2020 NIPS NeurIPS 2020

Provably Efficient Neural GTD for Off-Policy Learning

Abstract

This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE). For off-policy learning, we show that the minimum MSBE problem can be recast into a min-max optimization involving a pair of over-parameterized primal-dual NNs. The resultant formulation can then be tackled using a neural GTD algorithm. We analyze the convergence of the proposed algorithm with a 2-layer ReLU NN architecture using $m$ neurons and prove that it computes an approximate optimal solution to the minimum MSBE problem as $m \rightarrow \infty$.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — gradient temporal difference

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hoi-To Wai , Zhuoran Yang , Zhaoran Wang , Mingyi Hong

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Reinforcement Learning > Methods > Offline RL

Keywords

off-policy learning bellman error mean squared error function approximator neural network gradient temporal difference

Download PDF

Related papers

Higher-Order Spectral Clustering of Directed Graphs 2020

Self-Supervised MultiModal Versatile Networks 2020

Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates 2020

Causal Intervention for Weakly-Supervised Semantic Segmentation 2020

Taming Discrete Integration via the Boon of Dimensionality 2020