← Optimization & Theory

Deep Learning › Optimization & Theory ›

Neural Network Optimization

902 directly classified papers

Papers per year

Papers

PACE: Pacing Operator Learning to Accurate Optical Field Simulation for Complicated Photonic Devices NIPS 2024

The Impact of Geometric Complexity on Neural Collapse in Transfer Learning NIPS 2024

An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness NIPS 2024

OneBit: Towards Extremely Low-bit Large Language Models NIPS 2024

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks AISTATS 2024

Learning Neural Contracting Dynamics: Extended Linearization and Global Guarantees NIPS 2024

Bayes-optimal learning of an extensive-width neural network from quadratically many samples NIPS 2024

Neural Collapse To Multiple Centers For Imbalanced Data NIPS 2024

QT-ViT: Improving Linear Attention in ViT with Quadratic Taylor Expansion NIPS 2024

Dynamic Neural Regeneration: Enhancing Deep Learning Generalization on Small Datasets NIPS 2024

Improving Equivariant Model Training via Constraint Relaxation NIPS 2024

Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level NIPS 2024

DiffLoc: Diffusion Model for Outdoor LiDAR Localization CVPR 2024

Analysing The Impact of Sequence Composition on Language Model Pre-Training ACL 2024

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees EMNLP 2024

Deep Learning for Computing Convergence Rates of Markov Chains NIPS 2024

Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes EMNLP 2024

Second-order forward-mode optimization of recurrent neural networks for neuroscience NIPS 2024

Unraveling the Gradient Descent Dynamics of Transformers NIPS 2024

nanoT5: Fast & Simple Pre-training and Fine-tuning of T5 Models with Limited Resources EMNLP 2023

Sub-network Discovery and Soft-masking for Continual Learning of Mixed Tasks EMNLP 2023

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training EMNLP 2023

AdaNorm: Adaptive Gradient Norm Correction Based Optimizer for CNNs WACV 2023

TokenDrop + BucketSampler: Towards Efficient Padding-free Fine-tuning of Language Models EMNLP 2023

Addressing the Length Bias Challenge in Document-Level Neural Machine Translation EMNLP 2023