← Optimization & Theory

Deep Learning › Optimization & Theory ›

Model Compression

1674 directly classified papers

Papers per year

Papers

AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning ICCV 2025

FocusLLM: Precise Understanding of Long Context by Dynamic Condensing ACL 2025

Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models ICCV 2025

Towards compact and efficient Slovak summarization models ACL 2025

Emulating Self-attention with Convolution for Efficient Image Super-Resolution ICCV 2025

DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers ICCV 2025

Text Embedding Knows How to Quantize Text-Guided Diffusion Models ICCV 2025

LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching ICCV 2025

A Good Teacher Adapts Their Knowledge for Distillation ICCV 2025

Assigning Distinct Roles to Quantized and Low-Rank Matrices Toward Optimal Weight Decomposition ACL 2025

Improving Continual Pre-training Through Seamless Data Packing ACL 2025

MiniKV: Pushing the Limits of 2-Bit KV Cache via Compression and System Co-Design for Efficient Long Context Inference ACL 2025

Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models ICCV 2025

Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation ICCV 2025

FREE: Fast and Robust Vision Language Models with Early Exits ACL 2025

QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning ICCV 2025

VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping AAAI 2025

Accelerating Diffusion Transformer via Gradient-Optimized Cache ICCV 2025

TCFG: Truncated Classifier-Free Guidance for Efficient and Scalable Text-to-Image Acceleration ICCV 2025

BitNet: 1-bit Pre-training for Large Language Models JMLR 2025

Recall with Reasoning: Chain-of-Thought Distillation for Mamba’s Long-Context Memory and Extrapolation EMNLP 2025

Slender-Mamba: Fully Quantized Mamba in 1.58 Bits From Head to Toe COLING 2025

Improving Reasoning Capabilities in Small Models through Mixture-of-layers Distillation with Stepwise Attention on Key Information EMNLP 2025

Lightweight Safety Guardrails Using Fine-tuned BERT Embeddings COLING 2025

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework EMNLP 2025