← Optimization & Theory

Deep Learning › Optimization & Theory ›

Theory

1072 directly classified papers

Papers per year

Papers

The NLP Task Effectiveness of Long-Range Transformers EACL 2023

Absolute Position Embedding Learns Sinusoid-like Waves for Attention Based on Relative Position EMNLP 2023

Memorisation Cartography: Mapping out the Memorisation-Generalisation Continuum in Neural Machine Translation EMNLP 2023

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models NIPS 2023

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs NIPS 2023

On the Dynamics Under the Unhinged Loss and Beyond JMLR 2023

Instance-Dependent Generalization Bounds via Optimal Transport JMLR 2023

On the Expressivity Role of LayerNorm in Transformers’ Attention ACL 2023

Sequential Integrated Gradients: a simple but effective method for explaining language models ACL 2023

Beyond Positive Scaling: How Negation Impacts Scaling Trends of Language Models ACL 2023

Honey, I Shrunk the Language: Language Model Behavior at Reduced Scale. ACL 2023

Dropout Training is Distributionally Robust Optimal JMLR 2023

Transformer Language Models Handle Word Frequency in Prediction Head ACL 2023

Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions ACL 2023

Benign overfitting in ridge regression JMLR 2023

Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs NIPS 2023

Maximum likelihood estimation in Gaussian process regression is ill-posed JMLR 2023

Deep linear networks can benignly overfit when shallow ones do JMLR 2023

Experimental Observations of the Topology of Convolutional Neural Network Activations AAAI 2023

Fast Convergence in Learning Two-Layer Neural Networks with Separable Data AAAI 2023

An Operator Theoretic Approach for Analyzing Sequence Neural Networks AAAI 2023

The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models AAAI 2023

Local Intrinsic Dimensional Entropy AAAI 2023

The Analysis of Deep Neural Networks by Information Theory: From Explainability to Generalization AAAI 2023

Neural Representations Reveal Distinct Modes of Class Fitting in Residual Convolutional Networks AAAI 2023