Artificial Intelligence › Core AI ›

AI Safety

2972 directly classified papers

Papers per year

Papers

ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations EACL 2026

Beyond Prompting: AI Safety Education in the Generative AI Era AAAI 2026

Co-Designing Unplugged Learning Activities with K-2 Teachers for Early AI Literacy Education AAAI 2026

Formal Verification of Neural ODE for Safety Evaluation in Autonomous Vehicles AAAI 2026

Optimisation Problems in Constrained Machine Learning AAAI 2026

How Reasoning Influences Intersectional Biases in Vision–Language Models (Student Abstract) AAAI 2026

Obedience or Vigilance? How Large Language Models React to Malicious Multiple-Choice Options (Student Abstract) AAAI 2026

CAPO: A Unified Policy Gradient Approach for Reward and Cost Optimization in Safe Reinforcement Learning (Student Abstract) AAAI 2026

Towards Capable and Secure Autonomous Computer-Use Agents (Student Abstract) AAAI 2026

Distractor-Based Jailbreaking Attacks in Language Models and Associated Changes in Chain-of-Thought Content (Student Abstract) AAAI 2026

Error Correction in Radiology Reports: A Knowledge Distillation-Based Multi-Stage Framework AAAI 2026

LLM Safety in Judicial AI: A Stress Test of Social Media Influence on Real-World Judgments AAAI 2026

Implications for AI Research: Applying Lessons from the Expert Systems Boom and Bust to the Current Large-Language Model Boom AAAI 2026

Scaling Up AI Alignment AAAI 2026

Safe Reinforcement Learning for Trustworthy AI: Theory, Algorithms, and Applications AAAI 2026

Artificial Immune System of Secure Face Recognition Against Adversarial Attacks (Abstract Reprint) AAAI 2026

Scalable Synthesis of Formally Verified Neural Value Function for Hamilton-Jacobi Reachability Analysis (Abstract Reprint) AAAI 2026

InfrastructureSentinel: Policy Enforced Guardrails for Secure MCP-driven Infrastructure Agents AAAI 2026

Moral Change or Noise? On Problems of Aligning AI with Temporally Unstable Human Feedback AAAI 2026

Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment AAAI 2026

Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment AAAI 2026

MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control AAAI 2026

Mitigating Self-Preference by Authorship Obfuscation AAAI 2026

STACK: Adversarial Attacks on LLM Safeguard Pipelines AAAI 2026

Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping AAAI 2026