Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Adversarial Learning
1235 directly classified papers
Papers per year
2009: 1
2010: 1
2011: 1
2013: 1
2014: 1
2016: 1
2017: 7
2018: 35
2019: 86
2020: 130
2021: 166
2022: 188
2023: 166
2024: 185
2025: 264
2026: 2
Papers
Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment
EMNLP 2025
MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework
CVPR 2025
Instant Adversarial Purification with Adversarial Consistency Distillation
CVPR 2025
Rethinking the Adversarial Robustness of Multi-Exit Neural Networks in an Attack-Defense Game
CVPR 2025
Nullspace Disentanglement for Red Teaming Language Models
EMNLP 2025
Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs
ACL 2025
Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection Systems
ACL 2025
Using Humor to Bypass Safety Guardrails in Large Language Models
ACL 2025
NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping
ICCV 2025
Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt. Generation for Enhanced LLM Content Moderation
ACL 2025
SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs
EMNLP 2025
Pre-Trained Multiple Latent Variable Generative Models are Good Defenders Against Adversarial Attacks
WACV 2025
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift
CVPR 2025
Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization
CVPR 2025
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
ACL 2025
Guardrails and Security for LLMs: Safe, Secure and Controllable Steering of LLM Applications
ACL 2025
Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems
ACL 2025
Sheep’s Skin, Wolf’s Deeds: Are LLMs Ready for Metaphorical Implicit Hate Speech?
ACL 2025
Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region
ACL 2025
Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding
EMNLP 2025
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
ACL 2025
LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts
ACL 2025
NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks
EMNLP 2025
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation
CVPR 2025
Saliuitl: Ensemble Salience Guided Recovery of Adversarial Patches against CNNs
CVPR 2025
<
1
2
3
4
5
…
50
>