Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
EACL 2026
Beyond Prompting: AI Safety Education in the Generative AI Era
AAAI 2026
Co-Designing Unplugged Learning Activities with K-2 Teachers for Early AI Literacy Education
AAAI 2026
Formal Verification of Neural ODE for Safety Evaluation in Autonomous Vehicles
AAAI 2026
Optimisation Problems in Constrained Machine Learning
AAAI 2026
How Reasoning Influences Intersectional Biases in Vision–Language Models (Student Abstract)
AAAI 2026
Obedience or Vigilance? How Large Language Models React to Malicious Multiple-Choice Options (Student Abstract)
AAAI 2026
CAPO: A Unified Policy Gradient Approach for Reward and Cost Optimization in Safe Reinforcement Learning (Student Abstract)
AAAI 2026
Towards Capable and Secure Autonomous Computer-Use Agents (Student Abstract)
AAAI 2026
Distractor-Based Jailbreaking Attacks in Language Models and Associated Changes in Chain-of-Thought Content (Student Abstract)
AAAI 2026
Error Correction in Radiology Reports: A Knowledge Distillation-Based Multi-Stage Framework
AAAI 2026
LLM Safety in Judicial AI: A Stress Test of Social Media Influence on Real-World Judgments
AAAI 2026
Implications for AI Research: Applying Lessons from the Expert Systems Boom and Bust to the Current Large-Language Model Boom
AAAI 2026
Scaling Up AI Alignment
AAAI 2026
Safe Reinforcement Learning for Trustworthy AI: Theory, Algorithms, and Applications
AAAI 2026
Artificial Immune System of Secure Face Recognition Against Adversarial Attacks (Abstract Reprint)
AAAI 2026
Scalable Synthesis of Formally Verified Neural Value Function for Hamilton-Jacobi Reachability Analysis (Abstract Reprint)
AAAI 2026
InfrastructureSentinel: Policy Enforced Guardrails for Secure MCP-driven Infrastructure Agents
AAAI 2026
Moral Change or Noise? On Problems of Aligning AI with Temporally Unstable Human Feedback
AAAI 2026
Cost-Minimized Label-Flipping Poisoning Attack to LLM Alignment
AAAI 2026
Dropouts in Confidence: Moral Uncertainty in Human-LLM Alignment
AAAI 2026
MobileSafetyBench: Evaluating Safety of Autonomous Agents in Mobile Device Control
AAAI 2026
Mitigating Self-Preference by Authorship Obfuscation
AAAI 2026
STACK: Adversarial Attacks on LLM Safeguard Pipelines
AAAI 2026
Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping
AAAI 2026
<
1
2
3
4
5
…
119
>