Leveraging Human Input to Enable Robust, Interactive, and Aligned AI Systems

Daniel S. Brown

2025 AAAI AAAI 2025

Leveraging Human Input to Enable Robust, Interactive, and Aligned AI Systems

Abstract

Abstract Ensuring that AI systems do what we, as humans, actually want them to do, is one of the biggest open research challenges in AI alignment and safety. My research seeks to directly address this challenge by enabling AI systems to interact with humans to learn aligned and robust behaviors. The way in which robots and other AI systems behave is often the result of optimizing a reward function. However, manually designing good reward functions is highly challenging and error prone, even for domain experts. Consider trying to write down a reward function that describes good driving behavior or how you like your bed made in the morning. While reward functions for these tasks are difficult to manually specify, human feedback in the form of demonstrations or preferences are often much easier to obtain. However, human data is often difficult to interpret, due to ambiguity and noise. Thus, it is critical that AI systems take into account epistemic uncertainty over the human's true intent. My talk will give an overview of my lab's progress along the following fundamental research areas: (1) efficiently maintaining uncertainty over human intent, (2) directly optimizing behavior to be robust to uncertainty over human intent, and (3) actively querying for additional human input to reduce uncertainty over human intent.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — human intent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Daniel S. Brown

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Human-AI Interaction Reinforcement Learning > Methods > Policy Learning Machine Learning > Bayesian & Probabilistic > Bayesian Inference

Keywords

epistemic uncertainty preference learning inverse reinforcement learning reward function human feedback ai alignment human intent

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025