← Optimization & Theory

Deep Learning › Optimization & Theory ›

Theory

1072 directly classified papers

Papers per year

Papers

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis EMNLP 2024

Unveiling Linguistic Regions in Large Language Models ACL 2024

On the Empirical Complexity of Reasoning and Planning in LLMs EMNLP 2024

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning NIPS 2024

The Fine-Grained Complexity of Gradient Computation for Training Large Language Models NIPS 2024

Double-Descent Curves in Neural Networks: A New Perspective Using Gaussian Processes AAAI 2024

AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters ACL 2024

Unraveling the Gradient Descent Dynamics of Transformers NIPS 2024

Epistemic Uncertainty Quantification For Pre-Trained Neural Networks CVPR 2024

Exponential Hardness of Optimization from the Locality in Quantum Neural Networks AAAI 2024

Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models EMNLP 2024

Polyhedral Complex Derivation from Piecewise Trilinear Networks NIPS 2024

Nonlinear dynamics of localization in neural receptive fields NIPS 2024

A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention NIPS 2024

Understanding Surprising Generalization Phenomena in Deep Learning AAAI 2024

How does Gradient Descent Learn Features --- A Local Analysis for Regularized Two-Layer Neural Networks NIPS 2024

Towards Constituting Mathematical Structures for Learning to Optimize ICML 2023

Unveiling The Mask of Position-Information Pattern Through the Mist of Image Features ICML 2023

How Powerful are Shallow Neural Networks with Bandlimited Random Weights? ICML 2023

Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation ICML 2023

On the Correctness of Automatic Differentiation for Neural Networks with Machine-Representable Parameters ICML 2023

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation ICML 2023

Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond ICML 2023

Benign Overfitting in Two-layer ReLU Convolutional Neural Networks ICML 2023

Emergent Asymmetry of Precision and Recall for Measuring Fidelity and Diversity of Generative Models in High Dimensions ICML 2023