Algorithms for CVaR Optimization in MDPs

Yinlam Chow; Mohammad Ghavamzadeh

2014 NIPS NeurIPS 2014

Algorithms for CVaR Optimization in MDPs

Abstract

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in costs in addition to minimizing a standard criterion. Conditional value-at-risk (CVaR) is a relatively new risk measure that addresses some of the shortcomings of the well-known variance-related risk measures, and because of its computational efficiencies has gained popularity in finance and operations research. In this paper, we consider the mean-CVaR optimization problem in MDPs. We first derive a formula for computing the gradient of this risk-sensitive objective function. We then devise policy gradient and actor-critic algorithms that each uses a specific method to estimate this gradient and updates the policy parameters in the descent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in an optimal stopping problem.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

📈 Trend Setter — Risk Management

🧭 Keyword Pioneer — conditional value-at-risk

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Robotics

🌱 Topic Pioneer — Risk Management

🐣 Hot Topic Early Bird — reinforcement learning

Authors

Yinlam Chow , Mohammad Ghavamzadeh

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Risk Management Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Reinforcement Learning Artificial Intelligence > Core AI > Risk Management Machine Learning > Learning Types > Risk Management

Keywords

reinforcement learning markov decision processes policy gradient gradient estimation markov decision process actor-critic algorithm conditional value-at-risk risk-sensitive optimization

Download PDF

Related papers

Information-based learning by agents in unbounded state spaces 2014

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm 2014

Partition-wise Linear Models 2014

Active Regression by Stratification 2014

Cone-Constrained Principal Component Analysis 2014