← Optimization & Theory

Deep Learning › Optimization & Theory ›

Neural Network Optimization

902 directly classified papers

Papers per year

Papers

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models CVPR 2024

Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning CVPR 2024

Expanding Sparse Tuning for Low Memory Usage NIPS 2024

Neural Redshift: Random Networks are not Random Functions CVPR 2024

Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers CVPR 2024

A Layer-Wise Natural Gradient Optimizer for Training Deep Neural Networks NIPS 2024

Deep linear networks for regression are implicitly regularized towards flat minima NIPS 2024

Understanding and Minimising Outlier Features in Transformer Training NIPS 2024

Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling NIPS 2024

Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference NIPS 2024

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations NIPS 2024

Explicit Eigenvalue Regularization Improves Sharpness-Aware Minimization NIPS 2024

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning NIPS 2024

Why is parameter averaging beneficial in SGD? An objective smoothing perspective AISTATS 2024

A Cubic-regularized Policy Newton Algorithm for Reinforcement Learning AISTATS 2024

Sharpened Lazy Incremental Quasi-Newton Method AISTATS 2024

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes NIPS 2024

ADOPT: Modified Adam Can Converge with Any $\beta_2$ with the Optimal Rate NIPS 2024

On the Use of Anchoring for Training Vision Models NIPS 2024

Symmetries in Overparametrized Neural Networks: A Mean Field View NIPS 2024

Great Minds Think Alike: The Universal Convergence Trend of Input Salience NIPS 2024

Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent NIPS 2024

Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference EACL 2024

Second-order forward-mode optimization of recurrent neural networks for neuroscience NIPS 2024

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization NIPS 2024