Linear Regularizers Enforce the Strict Saddle Property

Matthew Ubl; Matthew Hale; Kasra Yazdani

2023 AAAI AAAI 2023

Linear Regularizers Enforce the Strict Saddle Property

Abstract

Abstract Satisfaction of the strict saddle property has become a standard assumption in non-convex optimization, and it ensures that many first-order optimization algorithms will almost always escape saddle points. However, functions exist in machine learning that do not satisfy this property, such as the loss function of a neural network with at least two hidden layers. First-order methods such as gradient descent may converge to non-strict saddle points of such functions, and there do not currently exist any first-order methods that reliably escape non-strict saddle points. To address this need, we demonstrate that regularizing a function with a linear term enforces the strict saddle property, and we provide justification for only regularizing locally, i.e., when the norm of the gradient falls below a certain threshold. We analyze bifurcations that may result from this form of regularization, and then we provide a selection rule for regularizers that depends only on the gradient of an objective function. This rule is shown to guarantee that gradient descent will escape the neighborhoods around a broad class of non-strict saddle points, and this behavior is demonstrated on numerical examples of non-strict saddle points common in the optimization literature.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Mathematics & Optimization

🧭 Keyword Pioneer — linear regularizer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Matthew Ubl , Matthew Hale , Kasra Yazdani

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Machine Learning > Optimization & Theory > Theory Deep Learning > Optimization & Theory > Optimization Mathematics & Optimization > Optimization > Non-Convex Optimization

Keywords

non-convex optimization neural network optimization gradient descent saddle point saddle point escape strict saddle property linear regularizer linear regularization

Download PDF

Related papers

A Model-Agnostic Heuristics for Selective Classification 2023

Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract) 2023

Head-Free Lightweight Semantic Segmentation with Linear Transformer 2023

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning 2023

Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse 2023