LAMP: Extracting Text from Gradients with Language Model Priors

Mislav Balunovic; Dimitar Dimitrov; Nikola Jovanović; Martin Vechev

2022 NIPS NeurIPS 2022

LAMP: Extracting Text from Gradients with Language Model Priors

Abstract

Recent work shows that sensitive user data can be reconstructed from gradient updates, breaking the key privacy promise of federated learning. While success was demonstrated primarily on image data, these methods do not directly transfer to other domains such as text. In this work, we propose LAMP, a novel attack tailored to textual data, that successfully reconstructs original text from gradients. Our attack is based on two key insights: (i) modelling prior text probability via an auxiliary language model, guiding the search towards more natural text, and (ii) alternating continuous and discrete optimization which minimizes reconstruction loss on embeddings while avoiding local minima via discrete text transformations. Our experiments demonstrate that LAMP is significantly more effective than prior work: it reconstructs 5x more bigrams and $23\%$ longer subsequences on average. Moreover, we are first to recover inputs from batch sizes larger than 1 for textual models. These findings indicate that gradient updates of models operating on textual data leak more information than previously thought.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Security & Privacy

🐣 Hot Topic Early Bird — privacy attack

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Mislav Balunovic , Dimitar Dimitrov , Nikola Jovanović , Martin Vechev

Topics

Artificial Intelligence > Learning Paradigms > Federated Learning Machine Learning > Application Areas > Privacy Security & Privacy > Privacy Machine Learning > Learning Paradigms > Federated Learning Deep Learning > Learning Types > Federated Learning

Keywords

federated learning privacy attack text reconstruction language model gradient inversion attack gradient leakage

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022