Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes

Shun Zhang; Edmund Durfee; Satinder Singh

2020 AAAI AAAI 2020

Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes

Abstract

Abstract An autonomous agent acting on behalf of a human user has the potential of causing side-effects that surprise the user in unsafe ways. When the agent cannot formulate a policy with only side-effects it knows are safe, it needs to selectively query the user about whether other useful side-effects are safe. Our goal is an algorithm that queries about as few potential side-effects as possible to find a safe policy, or to prove that none exists. We extend prior work on irreducible infeasible sets to also handle our problem's complication that a constraint to avoid a side-effect cannot be relaxed without user permission. By proving that our objectives are also adaptive submodular, we devise a querying algorithm that we empirically show finds nearly-optimal queries with much less computation than a guaranteed-optimal approach, and outperforms competing approximate approaches.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — uncertain safety constraint

🐣 Hot Topic Early Bird — autonomous agent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Shun Zhang , Edmund Durfee , Satinder Singh

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Planning Machine Learning > Optimization & Theory > Optimization Reinforcement Learning Reinforcement Learning > Methods > Deep RL Machine Learning > Learning Types > Reinforcement Learning Artificial Intelligence > Learning Paradigms > Active Learning Artificial Intelligence > Core AI > Decision Making Artificial Intelligence > Core AI > Safety

Keywords

submodular optimization active learning markov decision process constraint satisfaction autonomous agent safety constraint submodular function active querying safe policy uncertain safety constraint adaptive submodular querying algorithm query-based learning

Download PDF

Related papers

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions 2020

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2020

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention 2020

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy 2020

Multi-Point Semantic Representation for Intent Classification 2020