Reinforcement Learning with Almost Sure Constraints

Agustin Castellano; Hancheng Min; Enrique Mallada; Juan Andrés Bazerque

2022 L4DC L4DC 2022

Reinforcement Learning with Almost Sure Constraints

Abstract

In this work we address the problem of finding feasible policies for Constrained Markov Decision Processes under probability one constraints. We argue that stationary policies are not sufficient for solving this problem, and that a rich class of policies can be found by endowing the controller with a scalar quantity, so called budget, that tracks how close the agent is to violating the constraint. We show that the minimal budget required to act safely can be obtained as the smallest fixed point of a Bellman-like operator, for which we analyze its convergence properties. We also show how to learn this quantity when the true kernel of the Markov decision process is not known, while providing sample-complexity bounds. The utility of knowing this minimal budget relies in that it can aid in the search of optimal or near-optimal policies by shrinking down the region of the state space the agent must navigate. Simulations illustrate the different nature of probability one constraints against the typically used constraints in expectation.

🧭 Keyword Pioneer — almost sure constraint

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Agustin Castellano , Hancheng Min , Enrique Mallada , Juan Andrés Bazerque

Topics

Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning

Keywords

reinforcement learning policy learning constrained markov decision process stationary policy almost sure constraint sample-complexity bound

Download PDF

Related papers

Learning-Enabled Robust Control with Noisy Measurements 2022

Input-to-State Stable Neural Ordinary Differential Equations with Applications to Transient Modeling of Circuits 2022

Data-Driven Controller Synthesis of Unknown Nonlinear Polynomial Systems via Control Barrier Certificates 2022

Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks 2022

On the Effectiveness of Iterative Learning Control 2022