A policy gradient approach for optimization of smooth risk measures

Nithia Vijayan; L. A. Prashanth

2023 UAI UAI 2023

A policy gradient approach for optimization of smooth risk measures

Abstract

We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward. We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy and off-policy RL settings, respectively. We derive non-asymptotic bounds that quantify the rate of convergence of our proposed algorithms to a stationary point of the smooth risk measure. As special cases, we establish that our algorithms apply to optimization of mean-variance and distortion risk measures, respectively.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — smooth risk measure

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Nithia Vijayan , L. A. Prashanth

Topics

Machine Learning > Optimization & Theory > Bayesian Inference Machine Learning > Application Areas > Risk Management Reinforcement Learning > Methods > Policy Learning

Keywords

policy gradient off-policy learning risk-sensitive reinforcement learning distortion risk smooth risk measure

Download PDF

Related papers

Memory Mechanism for Unsupervised Anomaly Detection 2023

Semi-supervised learning of partial differential operators and dynamical flows 2023

Composing Efficient, Robust Tests for Policy Selection 2023

Inference for mark-censored temporal point processes 2023

Increasing effect sizes of pairwise conditional independence tests between random vectors 2023