Artificial Intelligence › Core AI ›

Adversarial Learning

1235 directly classified papers

Papers per year

Papers

TombRaider: Entering the Vault of History to Jailbreak Large Language Models EMNLP 2025

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection EMNLP 2025

Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding EMNLP 2025

NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping ICCV 2025

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors EMNLP 2025

RedHerring Attack: Testing the Reliability of Attack Detection EMNLP 2025

SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs EMNLP 2025

Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems ACL 2025

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt. Generation for Enhanced LLM Content Moderation ACL 2025

Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region ACL 2025

SPIRIT: Patching Speech Language Models against Jailbreak Attacks EMNLP 2025

Jailbreak LLMs through Internal Stance Manipulation EMNLP 2025

AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt EMNLP 2025

Power of Diversity: Enhancing Data-Free Black-Box Attack with Domain-Augmented Learning AAAI 2025

Defense Against Prompt Injection Attack by Leveraging Attack Techniques ACL 2025

Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates ACL 2025

LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts ACL 2025

Cross-Modal Stealth: A Coarse-to-Fine Attack Framework for RGB-T Tracker AAAI 2025

Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack ACL 2025

Using Humor to Bypass Safety Guardrails in Large Language Models ACL 2025

RedHit: Adaptive Red-Teaming of Large Language Models via Search, Reasoning, and Preference Optimization ACL 2025

Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach ACL 2025

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models EMNLP 2025

Nullspace Disentanglement for Red Teaming Language Models EMNLP 2025

Turning Logic Against Itself: Probing Model Defenses Through Contrastive Questions EMNLP 2025