Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
AI Safety
2972 directly classified papers
Papers per year
2002: 1
2006: 1
2007: 1
2012: 4
2013: 1
2015: 5
2016: 1
2017: 13
2018: 40
2019: 91
2020: 111
2021: 181
2022: 204
2023: 333
2024: 642
2025: 1031
2026: 312
Papers
Do Large Language Models Reflect Demographic Pluralism in Safety?
EACL 2026
Persistent Personas? Role-Playing, Instruction Following, and Safety in Extended Interactions
EACL 2026
Antisocial Behavior Prediction: A Survey and Practical Guide
EACL 2026
Repairing Regex Vulnerabilities via Localization-Guided Instructions
EACL 2026
Jailbreaking Safeguarded Text-to-Image Models via Large Language Models
EACL 2026
Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models
EACL 2026
BAFLE-DCT: Bypassing Adversarial Filters via Frequency-Selective Embedding in the DCT Domain
WACV 2026
UltraClean: A Simple Framework to Train Robust Neural Networks against Backdoor Attacks
WACV 2026
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models
EACL 2026
When Can We Trust LLMs in Mental Health? Large-Scale Benchmarks for Reliable LLM Evaluation
EACL 2026
Detection of Adversarial Prompts with Model Predictive Entropy
EACL 2026
A Simple and Efficient Learning-Style Prompting for LLM Jailbreaking
EACL 2026
Process Evaluation for Agentic Systems
EACL 2026
Code-Switching as a Safety Failure Mode in Large Language Models: An Empirical Study of Roman Urdu across English, Mixed, and Transliteration-Only Inputs
EACL 2026
Position: Biomedical NLP Demands Specialization, Not Generalization
EACL 2026
Rethinking the Evaluation of Alignment Methods: Insights into Diversity, Generalisation, and Safety
EACL 2026
Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment
EACL 2026
MedRiskEval: Medical Risk Evaluation Benchmark of Language Models, On the Importance of User Perspectives in Healthcare Settings
EACL 2026
Being Kind Isn’t Always Being Safe: Diagnosing Affective Hallucination in LLMs
EACL 2026
VortexPIA: Indirect Prompt Injection Attack against LLMs for Efficient Extraction of User Privacy
EACL 2026
The Unintended Trade-off of AI Alignment: Balancing Hallucination Mitigation and Safety in LLMs
EACL 2026
Open-Domain Safety Policy Construction
EACL 2026
Safeguarding Language Models via Self-Destruct Trapdoor
EACL 2026
PATCH: Mitigating PII Leakage in Language Models with Privacy-Aware Targeted Circuit PatcHing
EACL 2026
DuFFin: A Dual-Level Fingerprinting Framework for LLMs IP Protection
EACL 2026
<
1
2
3
4
5
…
119
>