Accumulated Gradient Normalization

Joeri R. Hermans; Gerasimos Spanakis; Rico Möckel

2017 ACML ACML 2017

Accumulated Gradient Normalization

Abstract

This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous \textsceasgd and \textscdynsgd, which we show empirically.

🧭 Keyword Pioneer — gradient normalization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Joeri R. Hermans , Gerasimos Spanakis , Rico Möckel

Topics

Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization

Keywords

distributed optimization neural network optimization parameter server asynchronous parallel gradient normalization

Download PDF

Related papers

PHD: A Probabilistic Model of Hybrid Deep Collaborative Filtering for Recommender Systems 2017

Recognizing Art Style Automatically in Painting with Deep Learning 2017

Locally Smoothed Neural Networks 2017

Adaptive Sampling Scheme for Learning in Severely Imbalanced Large Scale Data 2017

Learning Predictive Leading Indicators for Forecasting Time Series Systems with Unknown Clusters of Forecast Tasks 2017