Provable Accelerated Convergence of Nesterov’s Momentum for Deep ReLU Neural Networks

Fangshuo Liao; Anastasios Kyrillidis

2024 ALT ALT 2024

Provable Accelerated Convergence of Nesterov’s Momentum for Deep ReLU Neural Networks

Abstract

Current state-of-the-art analyses on the convergence of gradient descent for training neural networks focus on characterizing properties of the loss landscape, such as the Polyak-Lojaciewicz (PL) condition and the restricted strong convexity. While gradient descent converges linearly under such conditions, it remains an open question whether Nesterov’s momentum enjoys accelerated convergence under similar settings and assumptions. In this work, we consider a new class of objective functions, where only a subset of the parameters satisfies strong convexity, and show Nesterov’s momentum achieves acceleration in theory for this objective class. We provide two realizations of the problem class, one of which is deep ReLU networks, which constitutes this work as the first that proves an accelerated convergence rate for non-trivial neural network architectures.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Fangshuo Liao , Anastasios Kyrillidis

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Architectures > Neural Networks

Keywords

gradient descent optimization theory convergence rate relu network accelerated convergence nesterov momentum

Download PDF

Related papers

The Impossibility of Parallelizing Boosting 2024

Online Recommendations for Agents with Discounted Adaptive Preferences 2024

RedEx: Beyond Fixed Representation Methods via Convex Optimization 2024

Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms 2024

A Polynomial Time, Pure Differentially Private Estimator for Binary Product Distributions 2024