2025
ICML
ICML 2025
MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention
Authors
Yucheng Li
,
Huiqiang Jiang
,
Chengruidong Zhang
,
Qianhui Wu
,
Xufang Luo
,
Surin Ahn
,
Amir H. Abdi
,
Dongsheng Li
,
Jianfeng Gao
,
Yuqing Yang
,
Lili Qiu