EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback

Ilyas Fatkhullin; Igor Sokolov; Eduard Gorbunov; Zhize Li; Peter Richtarik

2025 JMLR JMLR 2025

EF21 with Bells & Whistles: Six Algorithmic Extensions of Modern Error Feedback

Abstract

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators. However, existing theory of EF relies on very strong assumptions (e.g., bounded gradients), and provides pessimistic convergence rates (e.g., while the best known rate for EF in the smooth nonconvex regime, and when full gradients are compressed, is $O(1/T^{2/3})$, the rate of gradient descent in the same regime is $O(1/T)$). Recently, Richtàrik et al. (2021) proposed a new error feedback mechanism, EF21, based on the construction of a Markov compressor induced by a contractive compressor. EF21 removes the aforementioned theoretical deficiencies of EF and at the same time works better in practice. In this work we propose six practical extensions of EF21, all supported by strong convergence theory: partial participation, stochastic approximation, variance reduction, proximal setting, momentum, and bidirectional compression. To the best of our knowledge, several of these techniques have not been previously analyzed in combination with EF, and in cases where prior analysis exists---such as for bidirectional compression---our theoretical convergence guarantees significantly improve upon existing results. [abs] [ pdf ][ bib ] [ code ] © JMLR 2025. (edit, beta)

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization

Authors

Ilyas Fatkhullin , Igor Sokolov , Eduard Gorbunov , Zhize Li , Peter Richtarik

Topics

Machine Learning > Optimization & Theory > Distributed Learning Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Stochastic Methods Mathematics & Optimization > Optimization > Distributed Learning

Keywords

stochastic gradient descent distributed optimization gradient descent variance reduction stochastic approximation communication compression error feedback

Download PDF

Related papers

On the Natural Gradient of the Evidence Lower Bound 2025

Four Axiomatic Characterizations of the Integrated Gradients Attribution Method 2025

Extending Temperature Scaling with Homogenizing Maps 2025

Ontolearn---A Framework for Large-scale OWL Class Expression Learning in Python 2025

An Axiomatic Definition of Hierarchical Clustering 2025