On the Convergence of SARAH and Beyond

Bingcong Li; Meng Ma; Georgios B. Giannakis

2020 AISTATS AISTATS 2020

On the Convergence of SARAH and Beyond

Abstract

The main theme of this work is a unifying algorithm, \textbf{L}oop\textbf{L}ess \textbf{S}ARAH (L2S) for problems formulated as summation of $n$ individual loss functions. L2S broadens a recently developed variance reduction method known as SARAH. To find an $\epsilon$-accurate solution, L2S enjoys a complexity of ${\cal O}\big( (n+\kappa) \ln (1/\epsilon)\big)$ for strongly convex problems. For convex problems, when adopting an $n$-dependent step size, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/\epsilon)$; while for more frequently adopted $n$-independent step size, the complexity is ${\cal O}(n+ n/\epsilon)$. Distinct from SARAH, our theoretical findings support an $n$-independent step size in convex problems without extra assumptions. For nonconvex problems, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/\epsilon)$. Our numerical tests on neural networks suggest that L2S can have better generalization properties than SARAH. Along with L2S, our side results include the linear convergence of the last iteration for SARAH in strongly convex problems.

🐣 Hot Topic Early Bird — stochastic gradient descent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

Authors

Bingcong Li , Meng Ma , Georgios B. Giannakis

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Stochastic Methods Deep Learning > Optimization & Theory > Optimization

Keywords

stochastic optimization stochastic gradient descent neural network training nonconvex optimization convergence analysis neural network optimization variance reduction strongly convex strongly convex optimization

Download PDF

Related papers

Stretching the Effectiveness of MLE from Accuracy to Bias for Pairwise Comparisons 2020

Fast and Accurate Ranking Regression 2020

Nonparametric Sequential Prediction While Deep Learning the Kernel 2020

Nested-Wasserstein Self-Imitation Learning for Sequence Generation 2020

Unconditional Coresets for Regularized Loss Minimization 2020