Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Core AI
Artificial Intelligence
›
Core AI
›
Adversarial Learning
1235 directly classified papers
Papers per year
2009: 1
2010: 1
2011: 1
2013: 1
2014: 1
2016: 1
2017: 7
2018: 35
2019: 86
2020: 130
2021: 166
2022: 188
2023: 166
2024: 185
2025: 264
2026: 2
Papers
TombRaider: Entering the Vault of History to Jailbreak Large Language Models
EMNLP 2025
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
EMNLP 2025
Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding
EMNLP 2025
NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping
ICCV 2025
Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors
EMNLP 2025
RedHerring Attack: Testing the Reliability of Attack Detection
EMNLP 2025
SAFENUDGE: Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs
EMNLP 2025
Evading Toxicity Detection with ASCII-art: A Benchmark of Spatial Attacks on Moderation Systems
ACL 2025
Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt. Generation for Enhanced LLM Content Moderation
ACL 2025
Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region
ACL 2025
SPIRIT: Patching Speech Language Models against Jailbreak Attacks
EMNLP 2025
Jailbreak LLMs through Internal Stance Manipulation
EMNLP 2025
AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt
EMNLP 2025
Power of Diversity: Enhancing Data-Free Black-Box Attack with Domain-Augmented Learning
AAAI 2025
Defense Against Prompt Injection Attack by Leveraging Attack Techniques
ACL 2025
Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates
ACL 2025
LLMs know their vulnerabilities: Uncover Safety Gaps through Natural Distribution Shifts
ACL 2025
Cross-Modal Stealth: A Coarse-to-Fine Attack Framework for RGB-T Tracker
AAAI 2025
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
ACL 2025
Using Humor to Bypass Safety Guardrails in Large Language Models
ACL 2025
RedHit: Adaptive Red-Teaming of Large Language Models via Search, Reasoning, and Preference Optimization
ACL 2025
Bridging Robustness and Generalization Against Word Substitution Attacks in NLP via the Growth Bound Matrix Approach
ACL 2025
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
EMNLP 2025
Nullspace Disentanglement for Red Teaming Language Models
EMNLP 2025
Turning Logic Against Itself: Probing Model Defenses Through Contrastive Questions
EMNLP 2025
<
1
2
3
4
5
…
50
>