Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Optimization & Theory
Deep Learning
›
Optimization & Theory
›
Theory
1072 directly classified papers
Papers per year
2007: 1
2010: 4
2011: 1
2012: 3
2013: 4
2014: 5
2015: 2
2016: 11
2017: 31
2018: 47
2019: 67
2020: 97
2021: 128
2022: 225
2023: 155
2024: 209
2025: 81
2026: 1
Papers
Scalable Transformer for PDE Surrogate Modeling
NIPS 2023
Window-Based Distribution Shift Detection for Deep Neural Networks
NIPS 2023
Complex-valued Neurons Can Learn More but Slower than Real-valued Neurons via Gradient Descent
NIPS 2023
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models
NIPS 2023
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs
NIPS 2023
Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs
NIPS 2023
What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models
NIPS 2023
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
EMNLP 2023
Emergent Inabilities? Inverse Scaling Over the Course of Pretraining
EMNLP 2023
Manifold-Preserving Transformers are Effective for Short-Long Range Encoding
EMNLP 2023
Scaling Law for Document Neural Machine Translation
EMNLP 2023
Tokenization and the Noiseless Channel
ACL 2023
Learning Layer-wise Equivariances Automatically using Gradients
NIPS 2023
Norm-based Generalization Bounds for Sparse Neural Networks
NIPS 2023
The NLP Task Effectiveness of Long-Range Transformers
EACL 2023
Experimental Observations of the Topology of Convolutional Neural Network Activations
AAAI 2023
On the Dynamics Under the Unhinged Loss and Beyond
JMLR 2023
Fast Convergence in Learning Two-Layer Neural Networks with Separable Data
AAAI 2023
On the Expressive Flexibility of Self-Attention Matrices
AAAI 2023
Instance-Dependent Generalization Bounds via Optimal Transport
JMLR 2023
An Operator Theoretic Approach for Analyzing Sequence Neural Networks
AAAI 2023
The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models
AAAI 2023
On the Expressivity Role of LayerNorm in Transformers’ Attention
ACL 2023
Sequential Integrated Gradients: a simple but effective method for explaining language models
ACL 2023
Local Intrinsic Dimensional Entropy
AAAI 2023
<
1
…
14
15
16
…
43
>