Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Xiaoyu Chen; Jiachen Hu; Lihong Li; Liwei Wang

2021 ICLR ICLR 2021

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Abstract

Reinforcement learning (RL) in episodic, factored Markov decision processes (FMDPs) is studied. We propose an algorithm called FMDP-BF, which leverages the factorization structure of FMDP. The regret of FMDP-BF is shown to be exponentially smaller than that of optimal algorithms designed for non-factored MDPs, and improves on the best previous result for FMDPs~\citep{osband2014near} by a factor of $\sqrt{nH|\mathcal{S}_i|}$, where $|\mathcal{S}_i|$ is the cardinality of the factored state subspace, $H$ is the planning horizon and $n$ is the number of factored transition. To show the optimality of our bounds, we also provide a lower bound for FMDP, which indicates that our algorithm is near-optimal w.r.t. timestep $T$, horizon $H$ and factored state-action subspace cardinality. Finally, as an application, we study a new formulation of constrained RL, known as RL with knapsack constraints (RLwK), and provides the first sample-efficient algorithm based on FMDP-BF.

Authors

Xiaoyu Chen , Jiachen Hu , Lihong Li , Liwei Wang

Download PDF

Related papers

Predicting Infectiousness for Proactive Contact Tracing 2021

Adversarially Guided Actor-Critic 2021

Hierarchical Autoregressive Modeling for Neural Video Compression 2021

Unsupervised Discovery of 3D Physical Objects from Video 2021

Batch Reinforcement Learning Through Continuation Method 2021