Artificial Intelligence › Core AI ›

Adversarial Learning

1235 directly classified papers

Papers per year

Papers

Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment EMNLP 2025

MOS-Attack: A Scalable Multi-objective Adversarial Attack Framework CVPR 2025

Instant Adversarial Purification with Adversarial Consistency Distillation CVPR 2025

Rethinking the Adversarial Robustness of Multi-Exit Neural Networks in an Attack-Defense Game CVPR 2025

Nullspace Disentanglement for Red Teaming Language Models EMNLP 2025

Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs ACL 2025

Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection Systems ACL 2025

Using Humor to Bypass Safety Guardrails in Large Language Models ACL 2025

NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping ICCV 2025

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt. Generation for Enhanced LLM Content Moderation ACL 2025

SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs EMNLP 2025

Pre-Trained Multiple Latent Variable Generative Models are Good Defenders Against Adversarial Attacks WACV 2025

Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift CVPR 2025

Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization CVPR 2025

Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack ACL 2025

Guardrails and Security for LLMs: Safe, Secure and Controllable Steering of LLM Applications ACL 2025

Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems ACL 2025

Sheep’s Skin, Wolf’s Deeds: Are LLMs Ready for Metaphorical Implicit Hate Speech? ACL 2025

Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region ACL 2025

Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding EMNLP 2025

Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates ACL 2025

LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts ACL 2025

NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks EMNLP 2025

Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation CVPR 2025

Saliuitl: Ensemble Salience Guided Recovery of Adversarial Patches against CNNs CVPR 2025