Artificial Intelligence › Core AI ›

Adversarial Learning

1235 directly classified papers

Papers per year

Papers

Bias in the Mirror : Are LLMs opinions robust to their own adversarial attacks ACL 2025

TombRaider: Entering the Vault of History to Jailbreak Large Language Models EMNLP 2025

RP-PGD: Boosting Segmentation Robustness with a Region-and-Prototype Based Adversarial Attack AAAI 2025

VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models EMNLP 2025

Attacking Vision-Language Computer Agents via Pop-ups ACL 2025

Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors EMNLP 2025

Towards Robust Defense against Customization via Protective Perturbation Resistant to Diffusion-based Purification ICCV 2025

Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection EMNLP 2025

AGD: Adversarial Game Defense Against Jailbreak Attacks in Large Language Models ACL 2025

SUA: Stealthy Multimodal Large Language Model Unlearning Attack EMNLP 2025

PoolAtnRes: Towards Generalisable Differential Morphing Attack Detection WACV 2025

RedHerring Attack: Testing the Reliability of Attack Detection EMNLP 2025

When Visual State Space Model Meets Backdoor Attacks WACV 2025

SPIRIT: Patching Speech Language Models against Jailbreak Attacks EMNLP 2025

NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping ICCV 2025

Jailbreak LLMs through Internal Stance Manipulation EMNLP 2025

Adversarial Training for Probabilistic Robustness ICCV 2025

AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt EMNLP 2025

Integrating Argumentation Features for Enhanced Propaganda Detection in Arabic Narratives on the Israeli War on Gaza COLING 2025

SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection EMNLP 2025

SilverSpeak: Evading AI-Generated Text Detectors using Homoglyphs COLING 2025

Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time EMNLP 2025

PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection ICCV 2025

TrojanWave: Exploiting Prompt Learning for Stealthy Backdoor Attacks on Large Audio-Language Models EMNLP 2025

InfAL: Inference Time Adversarial Learning for Improving Research Ideation EMNLP 2025