The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Shuze Daniel Liu; Shuhang Chen; Shangtong Zhang

2025 JMLR JMLR 2025

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Abstract

Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of the strong law of large numbers and a form of the law of the iterated logarithm. [abs] [ pdf ][ bib ] © JMLR 2025. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Shuze Daniel Liu , Shuhang Chen , Shangtong Zhang

Topics

Machine Learning > Optimization & Theory > Stochastic Processes Reinforcement Learning > Methods > Policy Learning

Keywords

temporal difference learning function approximation stability analysis markov decision process stochastic approximation markovian noise

Download PDF

Related papers

On the Natural Gradient of the Evidence Lower Bound 2025

Four Axiomatic Characterizations of the Integrated Gradients Attribution Method 2025

Extending Temperature Scaling with Homogenizing Maps 2025

Ontolearn---A Framework for Large-scale OWL Class Expression Learning in Python 2025

An Axiomatic Definition of Hierarchical Clustering 2025