← Optimization & Theory

Deep Learning › Optimization & Theory ›

Theory

1072 directly classified papers

Papers per year

Papers

Convergence of Message-Passing Graph Neural Networks with Generic Aggregation on Large Random Graphs JMLR 2024

Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK JMLR 2024

The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis JMLR 2024

High Probability Convergence Bounds for Non-convex Stochastic Gradient Descent with Sub-Weibull Noise JMLR 2024

Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers NIPS 2024

Law of Large Numbers and Central Limit Theorem for Wide Two-layer Neural Networks: The Mini-Batch and Noisy Case JMLR 2024

A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks JMLR 2024

Generalization and Stability of Interpolating Neural Networks with Minimal Width JMLR 2024

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks JMLR 2024

The Effect of Generalisation on the Inadequacy of the Mode EACL 2024

Topological Generalization Bounds for Discrete-Time Stochastic Optimization Algorithms NIPS 2024

Improving Normalization With the James-Stein Estimator WACV 2024

The Expressive Capacity of State Space Models: A Formal Language Perspective NIPS 2024

Overparametrized Multi-layer Neural Networks: Uniform Concentration of Neural Tangent Kernel and Convergence of Stochastic Gradient Descent JMLR 2024

The good, the bad and the ugly sides of data augmentation: An implicit spectral regularization perspective JMLR 2024

From Activation to Initialization: Scaling Insights for Optimizing Neural Fields CVPR 2024

Multiple Descent in the Multiple Random Feature Model JMLR 2024

On the Effect of Initialization: The Scaling Path of 2-Layer Neural Networks JMLR 2024

In-Context Learning of a Linear Transformer Block: Benefits of the MLP Component and One-Step GD Initialization NIPS 2024

Globally Convergent Variational Inference NIPS 2024

Curvature Clues: Decoding Deep Learning Privacy with Input Loss Curvature NIPS 2024

Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling NIPS 2024

Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability AISTATS 2024

Separation and Bias of Deep Equilibrium Models on Expressivity and Learning Dynamics NIPS 2024

Scaling Laws for Data Filtering-- Data Curation cannot be Compute Agnostic CVPR 2024