Scaled Conjugate Gradient Method for Nonconvex Optimization in Deep Neural Networks

Naoki Sato; Koshiro Izumi; Hideaki Iiduka

2024 JMLR JMLR 2024

Scaled Conjugate Gradient Method for Nonconvex Optimization in Deep Neural Networks

Abstract

A scaled conjugate gradient method that accelerates existing adaptive methods utilizing stochastic gradients is proposed for solving nonconvex optimization problems with deep neural networks. It is shown theoretically that, whether with a constant or diminishing learning rate, the proposed method can obtain a stationary point of the problem. Additionally, its rate of convergence with a diminishing learning rate is verified to be superior to that of the conjugate gradient method. The proposed method is shown to minimize training loss functions faster than the existing adaptive methods in practical applications of image and text classification. Furthermore, in the training of generative adversarial networks, one version of the proposed method achieved the lowest Fréchet inception distance score among those of the adaptive methods. [abs] [ pdf ][ bib ] [ code ] © JMLR 2024. (edit, beta)

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

🧭 Keyword Pioneer — scaled conjugate gradient

Authors

Naoki Sato , Koshiro Izumi , Hideaki Iiduka

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Deep Learning > Architectures > Neural Networks Deep Learning > Optimization & Theory > Neural Network Optimization

Keywords

nonconvex optimization convergence rate deep neural network stationary point generative adversarial network conjugate gradient method adaptive method scaled conjugate gradient

Download PDF

Related papers

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks 2024

Convergence for nonconvex ADMM, with applications to CT imaging 2024

Functional Directed Acyclic Graphs 2024

Sum-of-norms clustering does not separate nearby balls 2024

Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning 2024