q-Learning in Continuous Time

Yanwei Jia; Xun Yu Zhou

2023 JMLR JMLR 2023

q-Learning in Continuous Time

Abstract

We study the continuous-time counterpart of Q-learning for reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation introduced by Wang et al. (2020). As the conventional (big) Q-function collapses in continuous time, we consider its first-order approximation and coin the term “(little) q-function". This function is related to the instantaneous advantage rate function as well as the Hamiltonian. We develop a “q-learning" theory around the q-function that is independent of time discretization. Given a stochastic policy, we jointly characterize the associated q-function and value function by martingale conditions of certain stochastic processes, in both on-policy and off-policy settings. We then apply the theory to devise different actor--critic algorithms for solving underlying RL problems, depending on whether or not the density function of the Gibbs measure generated from the q-function can be computed explicitly. One of our algorithms interprets the well-known Q-learning algorithm SARSA, and another recovers a policy gradient (PG) based continuous-time algorithm proposed in Jia and Zhou (2022b). Finally, we conduct simulation experiments to compare the performance of our algorithms with those of PG-based algorithms in Jia and Zhou (2022b) and time-discretized conventional Q-learning algorithms. [abs] [ pdf ][ bib ] [ code ] [ erratum ] © JMLR 2023. (edit, beta)

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Reinforcement Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yanwei Jia , Xun Yu Zhou

Topics

Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Mathematics & Optimization > Optimization > Optimization

Keywords

reinforcement learning diffusion process continuous time entropy regularization

Download PDF

Related papers

Flexible Model Aggregation for Quantile Regression 2023

Efficient Computation of Rankings from Pairwise Comparisons 2023

Efficient Structure-preserving Support Tensor Train Machine 2023

Attacks against Federated Learning Defense Systems and their Mitigation 2023

How Do You Want Your Greedy: Simultaneous or Repeated? 2023