Fast Forwarding Low-Rank Training

Adir Rahamim; Naomi Saphra; Sara Kangaslahti; Yonatan Belinkov

2024 EMNLP EMNLP 2024

Fast Forwarding Low-Rank Training

Abstract

AbstractParameter efficient finetuning methods like low-rank adaptation (LoRA) aim to reduce the computational costs of finetuning pretrained Language Models (LMs). Enabled by these low-rank settings, we propose an even more efficient optimization strategy: Fast Forward, a simple and effective approach to accelerate large segments of SGD training. In a Fast Forward stage, we repeat the most recent optimizer step until the loss stops improving on a tiny validation set. By alternating between regular optimization steps and Fast Forward stages, Fast Forward provides up to an 87% reduction in FLOPs over standard SGD with Adam. We validate Fast Forward by finetuning various models on different tasks and demonstrate that it speeds up training without compromising model performance. Additionally, we analyze when and how to apply Fast Forward.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — optimizer step

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Adir Rahamim , Naomi Saphra , Sara Kangaslahti , Yonatan Belinkov

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Pretraining Machine Learning > Application Areas > Model Compression Machine Learning > Learning Paradigms > Transfer Learning Machine Learning > Learning Types > Transfer Learning Deep Learning > Optimization & Theory > Neural Network Optimization Deep Learning > Optimization & Theory > Optimization Deep Learning > Optimization & Theory > Model Compression

Keywords

stochastic gradient descent neural network training language model low-rank adaptation training acceleration flops reduction parameter efficient finetuning optimizer step loss improvement

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024