Improving Authorship Privacy: Adaptive Obfuscation with the Dynamic Selection of Techniques

Hemanth Kandula; Damianos Karakos; Haoling Qiu; Brian Ulicny

2024 ACL ACL 2024

Improving Authorship Privacy: Adaptive Obfuscation with the Dynamic Selection of Techniques

Abstract

AbstractAuthorship obfuscation, the task of rewriting text to protect the original author’s identity, is becoming increasingly important due to the rise of advanced NLP tools for authorship attribution techniques. Traditional methods for authorship obfuscation face significant challenges in balancing content preservation, fluency, and style concealment. This paper introduces a novel approach, the Obfuscation Strategy Optimizer (OSO), which dynamically selects the optimal obfuscation technique based on a combination of metrics including embedding distance, meaning similarity, and fluency. By leveraging an ensemble of language models OSO achieves superior performance in preserving the original content’s meaning and grammatical fluency while effectively concealing the author’s unique writing style. Experimental results demonstrate that the OSO outperforms existing methods and approaches the performance of larger language models. Our evaluation framework incorporates adversarial testing against state-of-the-art attribution systems to validate the robustness of the obfuscation techniques. We release our code publicly at https://github.com/BBN-E/ObfuscationStrategyOptimizer

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — text rewriting

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hemanth Kandula , Damianos Karakos , Haoling Qiu , Brian Ulicny

Topics

Machine Learning > Application Areas > Privacy Natural Language Processing > Generation > Text Generation Artificial Intelligence > Core AI > Privacy Natural Language Processing > Applications > Text Processing

Keywords

ensemble learning style transfer text generation authorship attribution language model writing style authorship obfuscation text rewriting

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024