Temporal Difference Methods for the Variance of the Reward To Go

Aviv Tamar; Dotan Di Castro; Shie Mannor

2013 ICML ICML 2013

Temporal Difference Methods for the Variance of the Reward To Go

Abstract

In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose variants of both TD(0) and LSTD(λ) with linear function approximation, prove their convergence, and demonstrate their utility in a 4-dimensional continuous state space problem.

🚀 Conference Pioneer — ICML 2013

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Reinforcement Learning

📈 Trend Setter — Risk Management

🐣 Hot Topic Early Bird — reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Aviv Tamar , Dotan Di Castro , Shie Mannor

Topics

Machine Learning > Application Areas > Risk Management Reinforcement Learning > Methods > Policy Learning Mathematics & Optimization > Optimization > Stochastic Methods

Keywords

reinforcement learning temporal difference learning policy evaluation risk management function approximation

Download PDF

Related papers

Convex Adversarial Collective Classification 2013

Gaussian Process Vine Copulas for Multivariate Dependence 2013

Stochastic Simultaneous Optimistic Optimization 2013

Generic Exploration and K-armed Voting Bandits 2013

Robust Structural Metric Learning 2013