← Application Areas

Machine Learning › Application Areas ›

Model Compression

1503 directly classified papers

Papers per year

Papers

GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction EMNLP 2025

PPC-GPT: Federated Task-Specific Compression of Large Language Models via Pruning and Chain-of-Thought Distillation EMNLP 2025

Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval CVPR 2025

Interpreting the Effects of Quantization on LLMs IJCNLP 2025

Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models EMNLP 2025

DART: Distilling Autoregressive Reasoning to Silent Thought EMNLP 2025

Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales CVPR 2025

LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation EMNLP 2025

Language Models Can be Efficiently Steered via Minimal Embedding Layer Transformations EMNLP 2025

TASO: Task-Aligned Sparse Optimization for Parameter-Efficient Model Adaptation EMNLP 2025

The Role of Outgoing Connection Heterogeneity in Feedforward Layers of Large Language Models EMNLP 2025

Safety Alignment via Constrained Knowledge Unlearning ACL 2025

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression EMNLP 2025

GQSA: Group Quantization and Sparsity for Accelerating Large Language Model Inference IJCNLP 2025

HELENE: Hessian Layer-wise Clipping and Gradient Annealing for Accelerating Fine-tuning LLM with Zeroth-order Optimization EMNLP 2025

Scheduling Weight Transitions for Quantization-Aware Training ICCV 2025

DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization ACL 2025

CLaSp: In-Context Layer Skip for Self-Speculative Decoding ACL 2025

CTCC: A Robust and Stealthy Fingerprinting Framework for Large Language Models via Cross-Turn Contextual Correlation Backdoor EMNLP 2025

Verifiable Format Control for Large Language Model Generations NAACL 2025

LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering ACL 2025

ImPart: Importance-Aware Delta-Sparsification for Improved Model Compression and Merging in LLMs ACL 2025

Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models ACL 2025

DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts ACL 2025

Studying the Role of Input-Neighbor Overlap in Retrieval-Augmented Language Models Training Efficiency EMNLP 2025