Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes

Mohamad Amin Sharifi Kolarijani; Gyula Max; Peyman Mohajerin Esfahani

2021 NIPS NeurIPS 2021

Fast Approximate Dynamic Programming for Infinite-Horizon Markov Decision Processes

Abstract

In this study, we consider the infinite-horizon, discounted cost, optimal control of stochastic nonlinear systems with separable cost and constraints in the state and input variables. Using the linear-time Legendre transform, we propose a novel numerical scheme for implementation of the corresponding value iteration (VI) algorithm in the conjugate domain. Detailed analyses of the convergence, time complexity, and error of the proposed algorithm are provided. In particular, with a discretization of size $X$ and $U$ for the state and input spaces, respectively, the proposed approach reduces the time complexity of each iteration in the VI algorithm from $O(XU)$ to $O(X+U)$, by replacing the minimization operation in the primal domain with a simple addition in the conjugate domain.

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Reinforcement Learning

📈 Trend Setter — Value Iteration

🧭 Keyword Pioneer — conjugate domain

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mohamad Amin Sharifi Kolarijani , Gyula Max , Peyman Mohajerin Esfahani

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Applications > Value Iteration Reinforcement Learning > Methods > Value Iteration Mathematics & Optimization > Optimization > Optimal Control

Keywords

markov decision process optimal control dynamic programming value iteration infinite horizon conjugate domain legendre transform

Download PDF

Related papers

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data 2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 2021

Test-Time Personalization with a Transformer for Human Pose Estimation 2021

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations 2021

Scalable Intervention Target Estimation in Linear Models 2021