← Optimization & Theory

Deep Learning › Optimization & Theory ›

Neural Network Optimization

902 directly classified papers

Papers per year

Papers

Spectral Scaling Laws in Language Models: emphHow Effectively Do Feed-Forward Networks Use Their Latent Space? EMNLP 2025

Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs’ Reasoning EMNLP 2025

A Proactive Reliability Metric for Detecting Failures in Language Model Training EMNLP 2025

Exploring smaller batch sizes for a high-performing BabyLM model architecture EMNLP 2025

Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability CVPR 2025

Lookahead Q-Cache: Achieving More Consistent KV Cache Eviction via Pseudo Query EMNLP 2025

Language Models Grow Less Humanlike beyond Phase Transition ACL 2025

Low-Rank Interconnected Adaptation across Layers ACL 2025

Segment-Based Attention Masking for GPTs ACL 2025

Value Residual Learning ACL 2025

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning ACL 2025

Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention ACL 2025

Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer ACL 2025

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation EMNLP 2025

FREE: Fast and Robust Vision Language Models with Early Exits ACL 2025

IG-Pruning: Input-Guided Block Pruning for Large Language Models EMNLP 2025

Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching EMNLP 2025

Variance Sensitivity Induces Attention Entropy Collapse and Instability in Transformers EMNLP 2025

Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning EMNLP 2025

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression EMNLP 2025

The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models EMNLP 2025

LightThinker: Thinking Step-by-Step Compression EMNLP 2025

Revealing the Deceptiveness of Knowledge Editing: A Mechanistic Analysis of Superficial Editing ACL 2025

NeuroAda: Activating Each Neuron’s Potential for Parameter-Efficient Fine-Tuning EMNLP 2025

EcoTune: Token-Efficient Multi-Fidelity Hyperparameter Optimization for Large Language Model Inference EMNLP 2025