2022
NIPS
NeurIPS 2022
Regret Bounds for Risk-Sensitive Reinforcement Learning
Abstract
In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Machine Learning and Mathematics & Optimization and Reinforcement Learning
🧭
Keyword Pioneer
— tail outcome
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy