Teach AI What It Doesn’t Know

Xuefeng Du

2026 AAAI AAAI 2026

Teach AI What It Doesn’t Know

Abstract

Abstract This talk surveys my research journey toward building reliable machine learning systems that behave safely and predictably in the open world. While modern machine learning models—including foundation models (FMs)—have demonstrated unprecedented capabilities, they often suffer from reliability failures under distribution shift, leading to overconfident mispredictions, hallucinated generations, or susceptibility to adversarial prompts. My research rethinks reliability not as an afterthought, but as a first-class algorithmic principle, to be optimized alongside accuracy with minimal human supervision. The talk is organized around three key threads. To respect the allotted 20-30 minutes, the first and second parts will be briefly discussed. 1. Unknown-Aware Learning via Outlier Synthesis. I introduce a class of learning algorithms that synthesize “virtual outliers” in representation or pixel space to explicitly teach models what they don’t know. This includes the VOS, NPOS, and Dream-OOD frameworks, which shape the energy landscape around in-distribution data to avoid overconfidence on OOD. 2. Learning in the Wild with Unlabeled Data. I present theoretical insights and practical algorithms for leveraging unlabeled in-the-wild data to improve reliability. This includes SAL framework, which uses a gradient-based spectral method to separate potential outliers, and SCONE, which handles semantic and covariate shifts via constrained optimization. These results turn unlabeled data contamination into a learning signal. 3. Reliable Foundation Models. I explore reliability failures in LLMs and multimodal systems. I introduce HaloScope for hallucination detection via subspace separation on LLM representations, and TSV that performs LLM latent steering for improved hallucination detection. I will also briefly cover the LLM security and alignment, which includes VLMGuard for detecting malicious prompts in vision-language models and a data-centric paradigm for AI alignment t

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xuefeng Du

Topics

Machine Learning > Learning Types > Self-Supervised Learning

Keywords

uncertainty quantification foundation model out-of-distribution detection semantic shift hallucination detection

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026