Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Failures to Surface Harmful Contents in Video Large Language Models
AAAI 2026
Reference Recommendation Based Membership Inference Attack Against Hybrid-Based Recommender Systems
AAAI 2026
Activation Manipulation Attack: Penetrating and Harmful Jailbreak Attack Against Large Vision-Language Models
AAAI 2026
FILTER: A Framework for Defending Against Backdoor Attacks in Vertical Federated Learning
AAAI 2026
Higher-Order Responsibility
AAAI 2026
SceneJailEval: A Scenario-Adaptive Multi-Dimensional Framework for Jailbreak Evaluation
AAAI 2026
Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection
AAAI 2026
IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
AAAI 2026
Learning Vision-Based Neural Network Controllers with Semi-Probabilistic Safety Guarantees
AAAI 2026
Dynamic Deep Prompt Optimization for Defending Against Jailbreak Attacks on LLMs
AAAI 2026
Efficient Verification and Falsification of ReLU Neural Barrier Certificates
AAAI 2026
Probing Semantic Insensitivity for Inference-Time Backdoor Defense in Multimodal Large Language Model
AAAI 2026
Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems
AAAI 2026
MCPTox: A Benchmark for Tool Poisoning on Real-World MCP Servers
AAAI 2026
ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models
AAAI 2026
MPMA: Preference Manipulation Attack Against Model Context Protocol
AAAI 2026
AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs
AAAI 2026
Efficient Hallucination Detection: Adaptive Bayesian Estimation of Semantic Entropy with Guided Semantic Exploration
AAAI 2026
SafetyReminder: Reviving Delayed Safety Awareness of Vision-Language Models to Defend Against Jailbreak Attacks
AAAI 2026
Mitigating Content Effects on Reasoning in Language Models Through Fine-Grained Activation Steering
AAAI 2026
When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models
AAAI 2026
Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape
AAAI 2026
Multi-Value Alignment for LLMs via Value Decorrelation and Extrapolation
AAAI 2026
Benchmarking and Enhancing Rule Knowledge-Driven Reasoning of Large Language Models
AAAI 2026
Test-time Prompt Intervention
AAAI 2026
<
1
2
3
4
5
…
119
>