Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Optimization & Theory
Deep Learning
›
Optimization & Theory
›
Optimization
1638 directly classified papers
Papers per year
2006: 5
2007: 2
2008: 4
2009: 2
2010: 2
2011: 3
2012: 8
2013: 25
2014: 19
2015: 22
2016: 31
2017: 42
2018: 68
2019: 104
2020: 148
2021: 174
2022: 178
2023: 209
2024: 345
2025: 244
2026: 3
Papers
Answer Convergence as a Signal for Early Stopping in Reasoning
EMNLP 2025
Stabilizing Sharpness-Aware Minimization Through A Simple Renormalization Strategy
JMLR 2025
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
JMLR 2025
MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models
EMNLP 2025
GAP: a Global Adaptive Pruning Method for Large Language Models
EMNLP 2025
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
EMNLP 2025
Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning
EMNLP 2025
Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization
EMNLP 2025
Temporal Scaling Law for Large Language Models
EMNLP 2025
Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance
EMNLP 2025
AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models
EMNLP 2025
Training compute-optimal transformer encoder models
EMNLP 2025
ProCut: LLM Prompt Compression via Attribution Estimation
EMNLP 2025
Select-then-Route : Taxonomy guided Routing for LLMs
EMNLP 2025
Efficient Inference for Large Language Models –Algorithm, Model, and System
EMNLP 2025
FroM: Frobenius Norm-Based Data-Free Adaptive Model Merging
EMNLP 2025
DTW-Align: Bridging the Modality Gap in End-to-End Speech Translation with Dynamic Time Warping Alignment
EMNLP 2025
A statistical perspective on algorithm unrolling models for inverse problems
JMLR 2025
LIPIDS: Learning-Based Illumination Planning in Discretized (Light) Space for Photometric Stereo
WACV 2025
FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers
IJCAI 2025
TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
ICCV 2025
MATO: A Model-Agnostic Training Optimization for Aspect Sentiment Triplet Extraction
NAACL 2025
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
NAACL 2025
As easy as PIE: understanding when pruning causes language models to disagree
NAACL 2025
The Surprising Effectiveness of Infinite-Width NTKs for Characterizing and Improving Model Training
AAAI 2025
<
1
…
5
6
7
…
66
>