Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Optimization & Theory
Deep Learning
›
Optimization & Theory
›
Theory
1072 directly classified papers
Papers per year
2007: 1
2010: 4
2011: 1
2012: 3
2013: 4
2014: 5
2015: 2
2016: 11
2017: 31
2018: 47
2019: 67
2020: 97
2021: 128
2022: 225
2023: 155
2024: 209
2025: 81
2026: 1
Papers
The NLP Task Effectiveness of Long-Range Transformers
EACL 2023
Absolute Position Embedding Learns Sinusoid-like Waves for Attention Based on Relative Position
EMNLP 2023
Memorisation Cartography: Mapping out the Memorisation-Generalisation Continuum in Neural Machine Translation
EMNLP 2023
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models
NIPS 2023
Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs
NIPS 2023
On the Dynamics Under the Unhinged Loss and Beyond
JMLR 2023
Instance-Dependent Generalization Bounds via Optimal Transport
JMLR 2023
On the Expressivity Role of LayerNorm in Transformers’ Attention
ACL 2023
Sequential Integrated Gradients: a simple but effective method for explaining language models
ACL 2023
Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models
ACL 2023
Honey, I Shrunk the Language: Language Model Behavior at Reduced Scale.
ACL 2023
Dropout Training is Distributionally Robust Optimal
JMLR 2023
Transformer Language Models Handle Word Frequency in Prediction Head
ACL 2023
Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions
ACL 2023
Benign overfitting in ridge regression
JMLR 2023
Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs
NIPS 2023
Maximum likelihood estimation in Gaussian process regression is ill-posed
JMLR 2023
Deep linear networks can benignly overfit when shallow ones do
JMLR 2023
Experimental Observations of the Topology of Convolutional Neural Network Activations
AAAI 2023
Fast Convergence in Learning Two-Layer Neural Networks with Separable Data
AAAI 2023
An Operator Theoretic Approach for Analyzing Sequence Neural Networks
AAAI 2023
The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models
AAAI 2023
Local Intrinsic Dimensional Entropy
AAAI 2023
The Analysis of Deep Neural Networks by Information Theory: From Explainability to Generalization
AAAI 2023
Neural Representations Reveal Distinct Modes of Class Fitting in Residual Convolutional Networks
AAAI 2023
<
1
…
15
16
17
…
43
>