2024 ICML ICML 2024

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads