Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

Lin Song; Yukang Chen; Shuai Yang; Xiaohan Ding; Yixiao Ge; Ying-Cong Chen; Ying Shan

2024 CVPR CVPR 2024

Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

Abstract

This paper focuses on the high computational complexity in Large Language Models (LLMs) a significant challenge in both natural language processing (NLP) and multi-modal tasks. We propose Low-Rank Approximation for Sparse At- tention (LoRA-Sparse) an innovative approach that strate- gically reduces this complexity. LoRA-Sparse introduces low-rank linear projection layers for sparse attention ap- proximation. It utilizes an order-mimic training methodol- ogy which is crucial for efficiently approximating the self- attention mechanism in LLMs. We empirically show that sparse attention not only reduces computational demands but also enhances model performance in both NLP and multi-modal tasks. This surprisingly shows that redundant attention in LLMs might be non-beneficial. We extensively validate LoRA-Sparse through rigorous empirical studies in both (NLP) and multi-modal tasks demonstrating its effec- tiveness and general applicability. Based on LLaMA and LLaVA models our methods can reduce more than half of the self-attention computation with even better performance than full-attention baselines.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lin Song , Yukang Chen , Shuai Yang , Xiaohan Ding , Yixiao Ge , Ying-Cong Chen , Ying Shan

Topics

Artificial Intelligence > Core AI > Model Compression Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Optimization & Theory > Optimization Artificial Intelligence > Core AI > Large Language Models Deep Learning > Optimization & Theory > Efficient Computing Deep Learning > Learning Types > Attention

Keywords

model compression multi-modal learning efficient computing low-rank approximation sparse attention large language model

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024