Artificial Intelligence › Core AI ›

Adversarial Learning

1235 directly classified papers

Papers per year

Papers

Unnoticed Yet Effective: A Hybrid Physical Camouflage Framework Against DNNs and Human Perception AAAI 2026

Mitigating Backdoor Attacks via Trigger Reconstruction and Model Hardening WACV 2026

SUA: Stealthy Multimodal Large Language Model Unlearning Attack EMNLP 2025

TombRaider: Entering the Vault of History to Jailbreak Large Language Models EMNLP 2025

RedHit: Adaptive Red-Teaming of Large Language Models via Search, Reasoning, and Preference Optimization ACL 2025

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection EMNLP 2025

Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs ACL 2025

Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems ACL 2025

LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts ACL 2025

Defense Against Prompt Injection Attack by Leveraging Attack Techniques ACL 2025

Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack ACL 2025

Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment EMNLP 2025

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors EMNLP 2025

NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping ICCV 2025

FaceShield: Defending Facial Image against Deepfake Threats ICCV 2025

Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates ACL 2025

Using Humor to Bypass Safety Guardrails in Large Language Models ACL 2025

Bypassing LLM Guardrails: An Empirical Analysis of Evasion Attacks against Prompt Injection and Jailbreak Detection Systems ACL 2025

Power of Diversity: Enhancing Data-Free Black-Box Attack with Domain-Augmented Learning AAAI 2025

Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region ACL 2025

Cross-Modal Stealth: A Coarse-to-Fine Attack Framework for RGB-T Tracker AAAI 2025

Sheep’s Skin, Wolf’s Deeds: Are LLMs Ready for Metaphorical Implicit Hate Speech? ACL 2025

Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach ACL 2025

Guardrails and Security for LLMs: Safe, Secure and Controllable Steering of LLM Applications ACL 2025

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt. Generation for Enhanced LLM Content Moderation ACL 2025