Timing as an Action: Learning When to Observe and Act

Helen Zhou; Audrey Huang; Kamyar Azizzadenesheli; David Childers; Zachary Lipton

2024 AISTATS AISTATS 2024

Timing as an Action: Learning When to Observe and Act

Abstract

In standard reinforcement learning setups, the agent receives observations and performs actions at evenly spaced intervals. However, in many real-world settings, observations are expensive, forcing agents to commit to courses of action for designated periods of time. Consider that doctors, after each visit, typically set not only a treatment plan but also a follow-up date at which that plan might be revised. In this work, we formalize the setup of timing-as-an-action. Through theoretical analysis in the tabular setting, we show that while the choice of delay intervals could be naively folded in as part of a composite action, these actions have a special structure and handling them intelligently yields statistical advantages. Taking a model-based perspective, these gains owe to the fact that delay actions do not add any parameters to the underlying model. For model estimation, we provide provable sample-efficiency improvements, and our experiments demonstrate empirical improvements in both healthcare simulators and classical reinforcement learning environments.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Reinforcement Learning

🧭 Keyword Pioneer — timing as action

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Helen Zhou , Audrey Huang , Kamyar Azizzadenesheli , David Childers , Zachary Lipton

Topics

Artificial Intelligence > Core AI > Planning Machine Learning > Application Areas > Efficient Computing Reinforcement Learning > Methods > Policy Learning Reinforcement Learning > Applications > Robotics Machine Learning > Learning Types > Reinforcement Learning

Keywords

sample complexity optimal control model-based reinforcement learning sampling efficiency healthcare application timing as action observation cost

Download PDF

Related papers

Causal Bandits with General Causal Models and Interventions 2024

Boundary-Aware Uncertainty for Feature Attribution Explainers 2024

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective 2024

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning 2024

Pure Exploration in Bandits with Linear Constraints 2024