POPCORN: Partially Observed Prediction Constrained Reinforcement Learning

Joseph Futoma; Michael Hughes; Finale Doshi-velez

2020 AISTATS AISTATS 2020

POPCORN: Partially Observed Prediction Constrained Reinforcement Learning

Abstract

Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.

🌉 Interdisciplinary Bridge — Healthcare & Medicine and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — medical decision-making

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Joseph Futoma , Michael Hughes , Finale Doshi-velez

Topics

Reinforcement Learning > Methods > Offline RL Reinforcement Learning > Applications Reinforcement Learning > Applications > Robotics Machine Learning > Learning Types > Reinforcement Learning Healthcare & Medicine > Clinical > Medical AI Machine Learning > Learning Types > Offline RL

Keywords

reinforcement learning batch learning off-policy learning medical decision-making partially observed markov decision process medical decision making

Download PDF

Related papers

Stretching the Effectiveness of MLE from Accuracy to Bias for Pairwise Comparisons 2020

Fast and Accurate Ranking Regression 2020

Nonparametric Sequential Prediction While Deep Learning the Kernel 2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation 2020

Unconditional Coresets for Regularized Loss Minimization 2020