Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures

Umit Köse; Andrzej Ruszczyński

2021 JMLR JMLR 2021

Risk-Averse Learning by Temporal Difference Methods with Markov Risk Measures

Abstract

We propose a novel reinforcement learning methodology where the system performance is evaluated by a Markov coherent dynamic risk measure with the use of linear value function approximations. We construct projected risk-averse dynamic programming equations and study their properties. We propose new risk-averse counterparts of the basic and multi-step methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem. [abs] [ pdf ][ bib ] © JMLR 2021. (edit, beta)

🧭 Keyword Pioneer — temporal difference method

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Umit Köse , Andrzej Ruszczyński

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning

Keywords

dynamic programming value function approximation risk-averse learning temporal difference method markov risk measure

Download PDF

Related papers

Optimal Feedback Law Recovery by Gradient-Augmented Sparse Polynomial Regression 2021

Normalizing Flows for Probabilistic Modeling and Inference 2021

Determining the Number of Communities in Degree-corrected Stochastic Block Models 2021

Guided Visual Exploration of Relations in Data Sets 2021

Safe Policy Iteration: A Monotonically Improving Approximate Policy Iteration Approach 2021