2024 ICML ICML 2024

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference