FAEDKV: Infinite-Window Fourier Transform for Unbiased KV Cache Compression

Runchao Li; Yao Fu; Mu Sheng; Xianxuan Long; Haotian Yu; Pan Li

2025 EMNLP EMNLP 2025

FAEDKV: Infinite-Window Fourier Transform for Unbiased KV Cache Compression

Abstract

AbstractThe efficacy of Large Language Models (LLMs) in long-context tasks is often hampered by the substantial memory footprint and computational demands of the Key-Value (KV) cache. Current compression strategies, including token eviction and learned projections, frequently lead to biased representations—either by overemphasizing recent/high-attention tokens or by repeatedly degrading information from earlier context—and may require costly model retraining. We present FAEDKV (Frequency-Adaptive Infinite-Window for KV cache), a novel, training-free KV cache compression framework that ensures unbiased information retention. FAEDKV operates by transforming the KV cache into the frequency domain using a proposed Infinite-Window Fourier Transform (IWDFT). This approach allows for the equalized contribution of all tokens to the compressed representation, effectively preserving both early and recent contextual information. A preliminary frequency ablation study identifies critical spectral components for layer-wise, targeted compression. Experiments on LongBench benchmark demonstrate FAEDKV’s superiority over existing methods by up to 22%. In addition, our method shows superior, position-agnostic retrieval accuracy on the Needle-In-A-Haystack task compared to compression based approaches.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Runchao Li , Yao Fu , Mu Sheng , Xianxuan Long , Haotian Yu , Pan Li

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Application Areas > Efficient Computing Artificial Intelligence > Core AI > Efficient Computing Deep Learning > Models > Transformers Deep Learning > Optimization & Theory > Model Compression

Keywords

fourier transform context window kv cache compression long-context modeling large language model attention efficiency

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025