2025
ICML
ICML 2025
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Authors
Xing Li
,
Zeyu Xing
,
Yiming Li
,
Linping Qu
,
Hui-Ling Zhen
,
Yiwu Yao
,
Wulong Liu
,
Sinno Jialin Pan
,
Mingxuan Yuan