2024 ICML ICML 2024

InferCept: Efficient Intercept Support for Augmented Large Language Model Inference