Budgeted Reinforcement Learning in Continuous State Space

Nicolas Carrara; Edouard Leurent; Romain Laroche; Tanguy Urvoy; Odalric-ambrym Maillard; Olivier Pietquin

2019 NIPS NeurIPS 2019

Budgeted Reinforcement Learning in Continuous State Space

Abstract

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of an upper bound on a constrains violation signal that -- importantly -- can be modified in real-time. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is the fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — budgeted markov decision process

🐣 Hot Topic Early Bird — safe reinforcement learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Nicolas Carrara , Edouard Leurent , Romain Laroche , Tanguy Urvoy , Odalric-ambrym Maillard , Olivier Pietquin

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Applications > Robotics Reinforcement Learning > Applications > Value Iteration Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Core AI > Risk Management

Keywords

deep reinforcement learning risk management continuous state space safe reinforcement learning safety constraint budgeted markov decision process

Download PDF

Related papers

Two Generator Game: Learning to Sample via Linear Goodness-of-Fit Test 2019

Metalearned Neural Memory 2019

Model Similarity Mitigates Test Set Overuse 2019

Continual Unsupervised Representation Learning 2019

Reinforcement Learning with Convex Constraints 2019