Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Optimization & Theory
Deep Learning
›
Optimization & Theory
›
Theory
1072 directly classified papers
Papers per year
2007: 1
2010: 4
2011: 1
2012: 3
2013: 4
2014: 5
2015: 2
2016: 11
2017: 31
2018: 47
2019: 67
2020: 97
2021: 128
2022: 225
2023: 155
2024: 209
2025: 81
2026: 1
Papers
Benign Overfitting in Two-layer ReLU Convolutional Neural Networks
ICML 2023
Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions
ICML 2023
How Does Information Bottleneck Help Deep Learning?
ICML 2023
On the Expressivity Role of LayerNorm in Transformers’ Attention
ACL 2023
Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
ICML 2023
On Bridging the Gap between Mean Field and Finite Width Deep Random Multilayer Perceptron with Batch Normalization
ICML 2023
Spread Flows for Manifold Modelling
AISTATS 2023
Maximal Initial Learning Rates in Deep ReLU Networks
ICML 2023
Beyond Lipschitz Smoothness: A Tighter Analysis for Nonconvex Optimization
ICML 2023
Width and Depth Limits Commute in Residual Networks
ICML 2023
Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees
ICML 2023
Overparameterized Random Feature Regression with Nearly Orthogonal Data
AISTATS 2023
Emergent Inabilities? Inverse Scaling Over the Course of Pretraining
EMNLP 2023
Scaling Laws for Multilingual Neural Machine Translation
ICML 2023
A Modern Look at the Relationship between Sharpness and Generalization
ICML 2023
Nonlinear Advantage: Trained Networks Might Not Be As Complex as You Think
ICML 2023
Scaling Laws for Generative Mixed-Modal Language Models
ICML 2023
On the Accelerated Noise-Tolerant Power Method
AISTATS 2023
Manifold-Preserving Transformers are Effective for Short-Long Range Encoding
EMNLP 2023
Global optimality of Elman-type RNNs in the mean-field regime
ICML 2023
Second-order regression models exhibit progressive sharpening to the edge of stability
ICML 2023
On double-descent in uncertainty quantification in overparametrized models
AISTATS 2023
Scaling Law for Document Neural Machine Translation
EMNLP 2023
Analysis of Catastrophic Forgetting for Random Orthogonal Transformation Tasks in the Overparameterized Regime
AISTATS 2023
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
EMNLP 2023
<
1
…
12
13
14
…
43
>