Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Deep Learning
›
Learning Types
›
Adversarial Learning
2063 directly classified papers
Papers per year
2010: 2
2014: 1
2015: 2
2016: 6
2017: 34
2018: 132
2019: 216
2020: 301
2021: 296
2022: 301
2023: 239
2024: 276
2025: 254
2026: 3
Papers
sudo rm -rf agentic_security
ACL 2025
Defensive Prompt Patch: A Robust and Generalizable Defense of Large Language Models against Jailbreak Attacks
ACL 2025
Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models
ACL 2025
Sandcastles in the Storm: Revisiting the (Im)possibility of Strong Watermarking
ACL 2025
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
ACL 2025
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
ACL 2025
M2S: Multi-turn to Single-turn jailbreak in Red Teaming for LLMs
ACL 2025
Can Indirect Prompt Injection Attacks Be Detected and Removed?
ACL 2025
DiffuseDef: Improved Robustness to Adversarial Attacks via Iterative Denoising
ACL 2025
PIG: Privacy Jailbreak Attack on LLMs via Gradient-based Iterative In-Context Optimization
ACL 2025
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
ACL 2025
Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach
ACL 2025
Jailbreaking? One Step Is Enough!
ACL 2025
Attacking Vision-Language Computer Agents via Pop-ups
ACL 2025
Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities
ACL 2025
from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors
ACL 2025
Stepwise Reasoning Disruption Attack of LLMs
ACL 2025
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text
ACL 2025
GAPO: Learning Preferential Prompt through Generative Adversarial Policy Optimization
ACL 2025
Root Defense Strategies: Ensuring Safety of LLM at the Decoding Level
ACL 2025
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations
ACL 2025
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs
ACL 2025
SDD: Self-Degraded Defense against Malicious Fine-tuning
ACL 2025
Battling Misinformation: An Empirical Study on Adversarial Factuality in Open-Source Large Language Models
NAACL 2025
Misalignment Attack on Text-to-Image Models via Text Embedding Optimization and Inversion
EMNLP 2025
<
1
2
3
4
5
…
83
>