Adam with model exponential moving average is effective for nonconvex optimization

Kwangjun Ahn; Ashok Cutkosky

2024 NIPS NeurIPS 2024

Adam with model exponential moving average is effective for nonconvex optimization

Abstract

In this work, we offer a theoretical analysis of two modern optimization techniques for training large and complex models: (i) adaptive optimization algorithms, such as Adam, and (ii) the model exponential moving average (EMA). Specifically, we demonstrate that a clipped version of Adam with model EMA achieves the optimal convergence rates in various nonconvex optimization settings, both smooth and nonsmooth. Moreover, when the scale varies significantly across different coordinates, we demonstrate that the coordinate-wise adaptivity of Adam is provably advantageous. Notably, unlike previous analyses of Adam, our analysis crucially relies on its core elements---momentum and discounting factors---as well as model EMA, motivating their wide applications in practice.

🧭 Keyword Pioneer — adaptive optimization algorithm

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Reinforcement Learning, Security & Privacy

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Mathematics & Optimization

Authors

Kwangjun Ahn , Ashok Cutkosky

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Stochastic Processes Deep Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Convex Optimization

Keywords

nonconvex optimization convergence rate adaptive optimization adam optimizer adaptive optimization algorithm model exponential moving average coordinate-wise adaptivity momentum method exponential moving average

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024