Approximate Relative Value Learning for Average-reward Continuous State MDPs

Hiteshi Sharma; Mehdi Jafarnia-Jahromi; Rahul Jain

2019 UAI UAI 2019

Approximate Relative Value Learning for Average-reward Continuous State MDPs

Abstract

In this paper, we propose an approximate relative value learning (ARVL) algorithm for non- parametric MDPs with continuous state space and finite actions and average reward criterion. It is a sampling based algorithm combined with kernel density estimation and function approximation via nearest neighbors. The theoretical analysis is done via a random contraction operator framework and stochastic dominance argument. This is the first such algorithm for continuous state space MDPs with average re- ward criteria with these provable properties which does not require any discretization of state space as far as we know. We then evaluate the proposed algorithm on a benchmark problem numerically.

🚀 Conference Pioneer — UAI 2019

🧭 Keyword Pioneer — average reward mdp

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization and Reinforcement Learning

🐣 Hot Topic Early Bird — value iteration

Authors

Hiteshi Sharma , Mehdi Jafarnia-Jahromi , Rahul Jain

Topics

Machine Learning > Optimization & Theory > Optimization Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Mathematics & Optimization > Optimization > Optimal Control

Keywords

reinforcement learning function approximation value iteration kernel density estimation continuous state space average reward mdp relative value learning

Download PDF

Related papers

Fisher-Bures Adversary Graph Convolutional Networks 2019

Augmenting and Tuning Knowledge Graph Embeddings 2019

Learning Factored Markov Decision Processes with Unawareness 2019

Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions 2019

Countdown Regression: Sharp and Calibrated Survival Predictions 2019