← Optimization & Theory

Deep Learning › Optimization & Theory ›

Theory

1072 directly classified papers

Papers per year

Papers

Benign Overfitting in Two-layer ReLU Convolutional Neural Networks ICML 2023

Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions ICML 2023

How Does Information Bottleneck Help Deep Learning? ICML 2023

On the Expressivity Role of LayerNorm in Transformers’ Attention ACL 2023

Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias ICML 2023

On Bridging the Gap between Mean Field and Finite Width Deep Random Multilayer Perceptron with Batch Normalization ICML 2023

Spread Flows for Manifold Modelling AISTATS 2023

Maximal Initial Learning Rates in Deep ReLU Networks ICML 2023

Beyond Lipschitz Smoothness: A Tighter Analysis for Nonconvex Optimization ICML 2023

Width and Depth Limits Commute in Residual Networks ICML 2023

Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees ICML 2023

Overparameterized Random Feature Regression with Nearly Orthogonal Data AISTATS 2023

Emergent Inabilities? Inverse Scaling Over the Course of Pretraining EMNLP 2023

Scaling Laws for Multilingual Neural Machine Translation ICML 2023

A Modern Look at the Relationship between Sharpness and Generalization ICML 2023

Nonlinear Advantage: Trained Networks Might Not Be As Complex as You Think ICML 2023

Scaling Laws for Generative Mixed-Modal Language Models ICML 2023

On the Accelerated Noise-Tolerant Power Method AISTATS 2023

Manifold-Preserving Transformers are Effective for Short-Long Range Encoding EMNLP 2023

Global optimality of Elman-type RNNs in the mean-field regime ICML 2023

Second-order regression models exhibit progressive sharpening to the edge of stability ICML 2023

On double-descent in uncertainty quantification in overparametrized models AISTATS 2023

Scaling Law for Document Neural Machine Translation EMNLP 2023

Analysis of Catastrophic Forgetting for Random Orthogonal Transformation Tasks in the Overparameterized Regime AISTATS 2023

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? EMNLP 2023