← Optimization & Theory

Deep Learning › Optimization & Theory ›

Neural Network Optimization

902 directly classified papers

Papers per year

Papers

Scalable Optimization in the Modular Norm NIPS 2024

The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks NIPS 2024

Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions JMLR 2024

Memory-Efficient LLM Training with Online Subspace Descent NIPS 2024

ADOPT: Modified Adam Can Converge with Any $\beta_2$ with the Optimal Rate NIPS 2024

On the Inductive Bias of Stacking Towards Improving Reasoning NIPS 2024

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers NIPS 2024

Counter-Current Learning: A Biologically Plausible Dual Network Approach for Deep Learning NIPS 2024

Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization NIPS 2024

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision NIPS 2024

Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent NIPS 2024

Zero-Shot Transfer of Neural ODEs NIPS 2024

PACE: Pacing Operator Learning to Accurate Optical Field Simulation for Complicated Photonic Devices NIPS 2024

The Impact of Geometric Complexity on Neural Collapse in Transfer Learning NIPS 2024

OneBit: Towards Extremely Low-bit Large Language Models NIPS 2024

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization NIPS 2024

Local to Global: Learning Dynamics and Effect of Initialization for Transformers NIPS 2024

Great Minds Think Alike: The Universal Convergence Trend of Input Salience NIPS 2024

Learning Neural Contracting Dynamics: Extended Linearization and Global Guarantees NIPS 2024

Symmetries in Overparametrized Neural Networks: A Mean Field View NIPS 2024

Neural Collapse To Multiple Centers For Imbalanced Data NIPS 2024

Dynamic Neural Regeneration: Enhancing Deep Learning Generalization on Small Datasets NIPS 2024

Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level NIPS 2024

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations NIPS 2024

DiffLoc: Diffusion Model for Outdoor LiDAR Localization CVPR 2024