Certified Defenses for Data Poisoning Attacks

Jacob Steinhardt; Pang Wei W Koh; Percy Liang

2017 NIPS NeurIPS 2017

Certified Defenses for Data Poisoning Attacks

Abstract

Machine learning systems trained on user-provided data are susceptible to data poisoning attacks, whereby malicious users inject false training data with the aim of corrupting the learned model. While recent work has proposed a number of attacks and defenses, little is understood about the worst-case loss of a defense in the face of a determined attacker. We address this by constructing approximate upper bounds on the loss across a broad family of attacks, for defenders that first perform outlier removal followed by empirical risk minimization. Our approximation relies on two assumptions: (1) that the dataset is large enough for statistical concentration between train and test error to hold, and (2) that outliers within the clean (non-poisoned) data do not have a strong effect on the model. Our bound comes paired with a candidate attack that often nearly matches the upper bound, giving us a powerful tool for quickly assessing defenses on a given dataset. Empirically, we find that even under a simple defense, the MNIST-1-7 and Dogfish datasets are resilient to attack, while in contrast the IMDB sentiment dataset can be driven from 12% to 23% test error by adding only 3% poisoned data.

📈 Trend Setter — Robustness

🧭 Keyword Pioneer — adversarial defense

🐣 Hot Topic Early Bird — adversarial robustness

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jacob Steinhardt , Pang Wei W Koh , Percy Liang

Topics

Machine Learning > Core Methods > Classification Machine Learning > Learning Types > Adversarial Learning Machine Learning > Optimization & Theory > Statistical Learning Machine Learning > Application Areas > Privacy Machine Learning > Application Areas > Risk Management Machine Learning > Learning Types > Robustness

Keywords

adversarial robustness data poisoning model robustness empirical risk minimization outlier removal adversarial defense certified defense

Download PDF

Related papers

High-Order Attention Models for Visual Question Answering 2017

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization 2017

Premise Selection for Theorem Proving by Deep Graph Embedding 2017

Neural Program Meta-Induction 2017

Safe and Nested Subgame Solving for Imperfect-Information Games 2017