← Optimization & Theory

Deep Learning › Optimization & Theory ›

Optimization

1638 directly classified papers

Papers per year

Papers

Answer Convergence as a Signal for Early Stopping in Reasoning EMNLP 2025

Stabilizing Sharpness-Aware Minimization Through A Simple Renormalization Strategy JMLR 2025

Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult JMLR 2025

MaZO: Masked Zeroth-Order Optimization for Multi-Task Fine-Tuning of Large Language Models EMNLP 2025

GAP: a Global Adaptive Pruning Method for Large Language Models EMNLP 2025

Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon EMNLP 2025

Mitigating Catastrophic Forgetting in Large Language Models with Forgetting-aware Pruning EMNLP 2025

Mixture of Weight-shared Heterogeneous Group Attention Experts for Dynamic Token-wise KV Optimization EMNLP 2025

Temporal Scaling Law for Large Language Models EMNLP 2025

Spec-VLA: Speculative Decoding for Vision-Language-Action Models with Relaxed Acceptance EMNLP 2025

AMQ: Enabling AutoML for Mixed-precision Weight-Only Quantization of Large Language Models EMNLP 2025

Training compute-optimal transformer encoder models EMNLP 2025

ProCut: LLM Prompt Compression via Attribution Estimation EMNLP 2025

Select-then-Route : Taxonomy guided Routing for LLMs EMNLP 2025

Efficient Inference for Large Language Models –Algorithm, Model, and System EMNLP 2025

FroM: Frobenius Norm-Based Data-Free Adaptive Model Merging EMNLP 2025

DTW-Align: Bridging the Modality Gap in End-to-End Speech Translation with Dynamic Time Warping Alignment EMNLP 2025

A statistical perspective on algorithm unrolling models for inverse problems JMLR 2025

LIPIDS: Learning-Based Illumination Planning in Discretized (Light) Space for Photometric Stereo WACV 2025

FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers IJCAI 2025

TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training ICCV 2025

MATO: A Model-Agnostic Training Optimization for Aspect Sentiment Triplet Extraction NAACL 2025

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts NAACL 2025

As easy as PIE: understanding when pruning causes language models to disagree NAACL 2025

The Surprising Effectiveness of Infinite-Width NTKs for Characterizing and Improving Model Training AAAI 2025