← Optimization & Theory

Deep Learning › Optimization & Theory ›

Theory

1072 directly classified papers

Papers per year

Papers

Scalable Transformer for PDE Surrogate Modeling NIPS 2023

Window-Based Distribution Shift Detection for Deep Neural Networks NIPS 2023

Complex-valued Neurons Can Learn More but Slower than Real-valued Neurons via Gradient Descent NIPS 2023

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models NIPS 2023

Learning a Neuron by a Shallow ReLU Network: Dynamics and Implicit Bias for Correlated Inputs NIPS 2023

Provable Advantage of Curriculum Learning on Parity Targets with Mixed Inputs NIPS 2023

What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models NIPS 2023

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? EMNLP 2023

Emergent Inabilities? Inverse Scaling Over the Course of Pretraining EMNLP 2023

Manifold-Preserving Transformers are Effective for Short-Long Range Encoding EMNLP 2023

Scaling Law for Document Neural Machine Translation EMNLP 2023

Tokenization and the Noiseless Channel ACL 2023

Learning Layer-wise Equivariances Automatically using Gradients NIPS 2023

Norm-based Generalization Bounds for Sparse Neural Networks NIPS 2023

The NLP Task Effectiveness of Long-Range Transformers EACL 2023

Experimental Observations of the Topology of Convolutional Neural Network Activations AAAI 2023

On the Dynamics Under the Unhinged Loss and Beyond JMLR 2023

Fast Convergence in Learning Two-Layer Neural Networks with Separable Data AAAI 2023

On the Expressive Flexibility of Self-Attention Matrices AAAI 2023

Instance-Dependent Generalization Bounds via Optimal Transport JMLR 2023

An Operator Theoretic Approach for Analyzing Sequence Neural Networks AAAI 2023

The Implicit Regularization of Momentum Gradient Descent in Overparametrized Models AAAI 2023

On the Expressivity Role of LayerNorm in Transformers’ Attention ACL 2023

Sequential Integrated Gradients: a simple but effective method for explaining language models ACL 2023

Local Intrinsic Dimensional Entropy AAAI 2023