SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers

Viktoriia A. Chekalina; Anna Rudenko; Gleb Mezentsev; Aleksandr Mikhalev; Alexander Panchenko; Ivan Oseledets

2024 EMNLP EMNLP 2024

SparseGrad: A Selective Method for Efficient Fine-tuning of MLP Layers

Abstract

AbstractThe performance of Transformer models has been enhanced by increasing the number of parameters and the length of the processed text. Consequently, fine-tuning the entire model becomes a memory-intensive process. High-performance methods for parameter-efficient fine-tuning (PEFT) typically work with Attention blocks and often overlook MLP blocks, which contain about half of the model parameters. We propose a new selective PEFT method, namely SparseGrad, that performs well on MLP blocks. We transfer layer gradients to a space where only about 1% of the layer’s elements remain significant. By converting gradients into a sparse structure, we reduce the number of updated parameters. We apply SparseGrad to fine-tune BERT and RoBERTa for the NLU task and LLaMa-2 for the Question-Answering task. In these experiments, with identical memory requirements, our method outperforms LoRA and MeProp, robust popular state-of-the-art PEFT approaches.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Viktoriia A. Chekalina , Anna Rudenko , Gleb Mezentsev , Aleksandr Mikhalev , Alexander Panchenko , Ivan Oseledets

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Optimization & Theory > Model Compression Deep Learning > Techniques > Fine-Tuning

Keywords

parameter-efficient fine-tuning sparse gradient gradient sparsity mlp layer

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024