SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

Xinhao Huang; You-Liang Huang; Zeyi Wen

2025 AAAI AAAI 2025

SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

Abstract

Abstract Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either special hardware support or expensive post-training to maintain model quality. To facilitate efficient and affordable model slimming, we propose a novel training-free compression method for LLMs, named “SoLA”, which leverages Soft activation sparsity and Low-rAnk decomposition. SoLA can identify and retain a minority of components significantly contributing to inference, while compressing the majority through low-rank decomposition, based on our analysis of the activation pattern in the feed-forward network (FFN) of modern LLMs. To alleviate the decomposition loss, SoLA is equipped with an adaptive component-wise low-rank allocation strategy to assign appropriate truncation positions for different weight matrices. We conduct extensive experiments on LLaMA-2-7B/13B/70B and Mistral-7B models across a variety of benchmarks. SoLA exhibits remarkable improvement in both language modeling and downstream task accuracy without post-training. For example, with a 30% compression rate on the LLaMA-2-70B model, SoLA surpasses the state-of-the-art method by reducing perplexity from 6.95 to 4.44 and enhancing downstream task accuracy by 10%.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — soft activation sparsity

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xinhao Huang , You-Liang Huang , Zeyi Wen

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Transformers Artificial Intelligence > Core AI > Large Language Models Deep Learning > Models > Large Language Models Deep Learning > Optimization & Theory > Model Compression

Keywords

model compression low-rank decomposition language model sparse activation feed-forward network parameter efficiency large language model soft activation sparsity

Download PDF

Related papers

BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving 2025

APIRL: Deep Reinforcement Learning for REST API Fuzzing 2025

Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation 2025

3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly Detection 2025

Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 2025