Human Control: Definitions and Algorithms

Ryan Carey; Tom Everitt

2023 UAI UAI 2023

Human Control: Definitions and Algorithms

Abstract

How can humans stay in control of advanced artificial intelligence systems? One proposal is corrigibility, which requires the agent to follow the instructions of a human overseer, without inappropriately influencing them. In this paper, we formally define a variant of corrigibility called shutdown instructability, and show that it implies appropriate shutdown behavior, retention of human autonomy, and avoidance of user harm. We also analyse the related concepts of non-obstruction and shutdown alignment, three previously proposed algorithms for human control, and one new algorithm.

🧭 Keyword Pioneer — shutdown behavior

🐣 Hot Topic Early Bird — agent system

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy

Authors

Ryan Carey , Tom Everitt

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > AI Safety

Keywords

agent system human oversight shutdown behavior human autonomy

Download PDF

Related papers

Memory Mechanism for Unsupervised Anomaly Detection 2023

Semi-supervised learning of partial differential operators and dynamical flows 2023

Composing Efficient, Robust Tests for Policy Selection 2023

Inference for mark-censored temporal point processes 2023

Increasing effect sizes of pairwise conditional independence tests between random vectors 2023