← Learning Types

Deep Learning › Learning Types ›

Adversarial Learning

2063 directly classified papers

Papers per year

Papers

sudo rm -rf agentic_security ACL 2025

Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks ACL 2025

Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models ACL 2025

Sandcastles in the Storm: Revisiting the (Im)possibility of Strong Watermarking ACL 2025

PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment ACL 2025

Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates ACL 2025

M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs ACL 2025

Can Indirect Prompt Injection Attacks Be Detected and Removed? ACL 2025

DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising ACL 2025

PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization ACL 2025

SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods ACL 2025

Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach ACL 2025

Jailbreaking? One Step Is Enough! ACL 2025

Attacking Vision-Language Computer Agents via Pop-ups ACL 2025

Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities ACL 2025

from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors ACL 2025

Stepwise Reasoning Disruption Attack of LLMs ACL 2025

People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text ACL 2025

GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization ACL 2025

Root Defense Strategies: Ensuring Safety of LLM at the Decoding Level ACL 2025

CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations ACL 2025

What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs ACL 2025

SDD: Self-Degraded Defense against Malicious Fine-tuning ACL 2025

Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models NAACL 2025

Misalignment Attack on Text-to-Image Models via Text Embedding Optimization and Inversion EMNLP 2025