Learning Factored Markov Decision Processes with Unawareness

Craig Innes; Alex Lascarides

2019 UAI UAI 2019

Learning Factored Markov Decision Processes with Unawareness

Abstract

Methods for learning and planning in sequential decision problems often assume the learner is aware of all possible states and actions in advance. This assumption is sometimes untenable. In this paper, we give a method to learn factored markov decision problems from both domain exploration and expert assistance, which guarantees convergence to near-optimal behaviour, even when the agent begins unaware of factors critical to success. Our experiments show our agent learns optimal behaviour on both small and large problems, and that conserving information on discovering new possibilities results in faster convergence.

🚀 Conference Pioneer — UAI 2019

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🧭 Keyword Pioneer — near-optimal behaviour

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

🐣 Hot Topic Early Bird — partial observability

Authors

Craig Innes , Alex Lascarides

Topics

Artificial Intelligence > Core AI > Planning Reinforcement Learning > Methods > Deep RL Reinforcement Learning > Methods > Policy Learning Machine Learning > Learning Types > Reinforcement Learning Machine Learning > Learning Types > Multi-Agent Systems

Keywords

reinforcement learning model-based learning markov decision process sequential decision partial observability factored mdp factored model near-optimal behaviour

Download PDF

Related papers

Fisher-Bures Adversary Graph Convolutional Networks 2019

Augmenting and Tuning Knowledge Graph Embeddings 2019

Expressive Priors in Bayesian Neural Networks: Kernel Combinations and Periodic Functions 2019

Countdown Regression: Sharp and Calibrated Survival Predictions 2019

Reducing Exploration of Dying Arms in Mortal Bandits 2019