Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent

Peva Blanchard; El Mahdi El Mhamdi; Rachid Guerraoui; Julien Stainer

2017 NIPS NeurIPS 2017

Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent

Abstract

We study the resilience to Byzantine failures of distributed implementations of Stochastic Gradient Descent (SGD). So far, distributed machine learning frameworks have largely ignored the possibility of failures, especially arbitrary (i.e., Byzantine) ones. Causes of failures include software bugs, network asynchrony, biases in local datasets, as well as attackers trying to compromise the entire system. Assuming a set of $n$ workers, up to $f$ being Byzantine, we ask how resilient can SGD be, without limiting the dimension, nor the size of the parameter space. We first show that no gradient aggregation rule based on a linear combination of the vectors proposed by the workers (i.e, current approaches) tolerates a single Byzantine failure. We then formulate a resilience property of the aggregation rule capturing the basic requirements to guarantee convergence despite $f$ Byzantine workers. We propose \emph{Krum}, an aggregation rule that satisfies our resilience property, which we argue is the first provably Byzantine-resilient algorithm for distributed SGD. We also report on experimental evaluations of Krum.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Machine Learning

📈 Trend Setter — Adversarial Learning

🧭 Keyword Pioneer — gradient aggregation

🐣 Hot Topic Early Bird — convergence guarantee

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Peva Blanchard , El Mahdi El Mhamdi , Rachid Guerraoui , Julien Stainer

Topics

Machine Learning > Learning Types > Adversarial Learning Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Optimization & Theory > Optimization Computer Science > Applications > Cybersecurity Artificial Intelligence > Core AI > Adversarial Learning

Keywords

stochastic gradient descent adversarial learning gradient aggregation distributed learning fault tolerance distributed machine learning convergence guarantee byzantine resilience byzantine failure

Download PDF

Related papers

High-Order Attention Models for Visual Question Answering 2017

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization 2017

Premise Selection for Theorem Proving by Deep Graph Embedding 2017

Neural Program Meta-Induction 2017

Safe and Nested Subgame Solving for Imperfect-Information Games 2017