Hallucinations at the Firewall

Woo Jon Hou Ainsley

2026 AAAI AAAI 2026

Hallucinations at the Firewall

Abstract

Abstract Generative AI shows strong capabilities in language, reasoning, and code but remains prone to hallucinations—outputs that are fluent yet incorrect. In cybersecurity, such errors pose serious risks, from misleading analysts to potential adversarial exploitation. This project investigates hallucinations in three directions: (1) creating benchmarks and interpretability tools to characterize them in security contexts; (2) developing mitigation strategies such as retrieval-augmented generation, symbolic-neural hybrids, and uncertainty-aware decoding; and (3) integrating these methods into real-world workflows like vulnerability assessment, malware analysis, and penetration testing, while exploring how attackers might exploit hallucinations. Evaluation will combine accuracy metrics, human-in-the-loop studies, and red-team simulations. By bridging theory and applied system design, the work aims to advance understanding of hallucinations and improve the reliability of AI in cybersecurity, with broader implications for other high-stakes areas such as healthcare and law.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — uncertainty-aware decoding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Woo Jon Hou Ainsley

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Application Areas > Risk Management

Keywords

hallucination mitigation retrieval-augmented generation vulnerability assessment uncertainty-aware decoding

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026