Controlled Generation for Private Synthetic Text

Zihao Zhao; Anjalie Field

2025 EMNLP EMNLP 2025

Controlled Generation for Private Synthetic Text

Abstract

AbstractText anonymization is essential for responsibly developing and deploying AI in high-stakes domains such as healthcare, social services, and law. In this work, we propose a novel methodology for privacy-preserving synthetic text generation that leverages the principles of de-identification and the Hiding In Plain Sight (HIPS) theory. Our approach introduces entity-aware control codes to guide controllable generation using either in-context learning (ICL) or prefix tuning. The ICL variant ensures privacy levels consistent with the underlying de-identification system, while the prefix tuning variant incorporates a custom masking strategy and loss function to support scalable, high-quality generation. Experiments on legal and clinical datasets demonstrate that our method achieves a strong balance between privacy protection and utility, offering a practical and effective solution for synthetic text generation in sensitive domains.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing and Security & Privacy

🧭 Keyword Pioneer — privacy-preserving generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Zihao Zhao , Anjalie Field

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Privacy Natural Language Processing > Generation > Text Generation Security & Privacy > Privacy Deep Learning > Learning Types > Generative Models

Keywords

controllable generation synthetic text generation controlled generation text anonymization synthetic text privacy-preserving generation entity-aware control

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025