GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction

Xuelin Li; Xiangqi Jin; Linfeng Zhang

2025 EMNLP EMNLP 2025

GraphKV: Breaking the Static Selection Paradigm with Graph-Based KV Cache Eviction

Abstract

AbstractEfficient Key-Value (KV) cache management is essential for processing long text sequences in large language models (LLMs), where memory constraints often limit performance. Conventional KV eviction strategies, such as top-k selection based on attention scores, depend on static heuristics that fail to capture the evolving implicit dependencies among tokens during inference. To overcome this, we propose GraphKV, a graph-based framework that redefines token selection for KV cache compression. In GraphKV, tokens are modeled as nodes with importance scores, and edges represent their similarity relationships. Through a decay-signal-propagation mechanism, token importance is dynamically updated by propagating information across the graph, enabling adaptive retention of the most contextually significant tokens. GraphKV can be seamlessly utilized in existing KV cache eviction methods such as SnapKV and PyramidKV in a plug-and-play manner. Codes are available in the supplementary materials and will be released on Github.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xuelin Li , Xiangqi Jin , Linfeng Zhang

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Model Architecture Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Application Areas > Model Compression Artificial Intelligence > Core AI > Efficient Computing Deep Learning > Optimization & Theory > Efficient Computing

Keywords

token selection efficient computing inference optimization kv cache long context cache eviction graph-based method kv cache eviction cache compression large language model graph neural network

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025