Adversarial Decoding: Generating Readable Documents for Adversarial Objectives

Collin Zhang; Tingwei Zhang; Vitaly Shmatikov

2026 EACL EACL 2026

Adversarial Decoding: Generating Readable Documents for Adversarial Objectives

Abstract

AbstractWe design, implement, and evaluate adversarial decoding, a new, generic text generation technique that produces readable documents for adversarial objectives such as RAG poisoning, jailbreaking, and evasion of defensive filters. Prior generation methods either produce easily detectable gibberish (even methods that optimize for low perplexity), or cannot handle objectives that include embedding similarity. In particular, they cannot produce readable adversarial documents that (1) are retrieved by RAG systems in response to broad classes of queries, and (2) adversarially influence subsequent generation. We measure the effectiveness of adversarial decoding for different objectives and demonstrate that it outperforms existing methods while producing adversarial documents that cannot be automatically distinguished from natural documents by fluency and readability.

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — readable document

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Collin Zhang , Tingwei Zhang , Vitaly Shmatikov

Topics

Machine Learning > Learning Types > Adversarial Learning Natural Language Processing > Generation > Text Generation

Keywords

adversarial learning text generation adversarial decoding readable document rag poisoning

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026