InstaHide: Instance-hiding Schemes for Private Distributed Learning

Yangsibo Huang; Zhao Song; Kai Li; Sanjeev Arora

2020 ICML ICML 2020

InstaHide: Instance-hiding Schemes for Private Distributed Learning

Abstract

How can multiple distributed entities train a shared deep net on their private data while protecting data privacy? This paper introduces InstaHide, a simple encryption of training images. Encrypted images can be used in standard deep learning pipelines (PyTorch, Federated Learning etc.) with no additional setup or infrastructure. The encryption has a minor effect on test accuracy (unlike differential privacy). Encryption consists of mixing the image with a set of other images (in the sense of Mixup data augmentation technique (Zhang et al., 2018)) followed by applying a random pixel-wise mask on the mixed image. Other contributions of this paper are: (a) Use of large public dataset of images (e.g. ImageNet) for mixing during encryption; this improves security. (b) Experiments demonstrating effectiveness in protecting privacy against known attacks while preserving model accuracy. (c) Theoretical analysis showing that successfully attacking privacy requires attackers to solve a difficult computational problem. (d) Demonstration that Mixup alone is insecure as (contrary to recent proposals), by showing some efficient attacks. (e) Release of a challenge dataset to allow design of new attacks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🧭 Keyword Pioneer — mixup augmentation

🐣 Hot Topic Early Bird — distributed learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy

Authors

Yangsibo Huang , Zhao Song , Kai Li , Sanjeev Arora

Topics

Artificial Intelligence > Learning Paradigms > Federated Learning Machine Learning > Application Areas > Privacy Machine Learning > Learning Types > Federated Learning Artificial Intelligence > Core AI > Privacy

Keywords

privacy-preserving machine learning federated learning distributed learning data privacy mixup augmentation data encryption private distributed learning instance hiding

Download PDF

Related papers

Correlation Clustering with Asymmetric Classification Errors 2020

Learning Portable Representations for High-Level Planning 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need 2020

Minimax Pareto Fairness: A Multi Objective Perspective 2020

DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training 2020