MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic

Yuyan Zhou; Liang Song; Bingning Wang; Weipeng Chen

2024 EMNLP EMNLP 2024

MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic

Abstract

AbstractThe advent of large language models (LLMs) like GPT-4 has catalyzed the exploration of multi-task learning (MTL), in which a single model demonstrates proficiency across diverse tasks. Task arithmetic has emerged as a cost-effective approach for MTL. It enables performance enhancement across multiple tasks by adding their corresponding task vectors to a pre-trained model. However, the current lack of a method that can simultaneously achieve optimal performance, computational efficiency, and data privacy limits their application to LLMs. In this paper, we propose Model Exclusive Task Arithmetic for merging GPT-scale models (MetaGPT) which formalizes the objective of model merging into a multi-task learning framework, aiming to minimize the average loss difference between the merged model and each individual task model. Since data privacy limits the use of multi-task training data, we leverage LLMs’ local linearity and task vectors’ orthogonality to separate the data term and scaling coefficients term and derive a model-exclusive task arithmetic method. Our proposed MetaGPT is data-agnostic and bypasses the heavy search process, making it cost-effective and easy to implement for LLMs. Extensive experiments demonstrate that MetaGPT leads to improvement of task arithmetic and achieves state-of-the-art performance on multiple tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yuyan Zhou , Liang Song , Bingning Wang , Weipeng Chen

Topics

Artificial Intelligence > Core AI > Model Compression Machine Learning > Application Areas > Model Merging Machine Learning > Learning Types > Multi-Task Learning Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Learning Paradigms > Multi-Task Learning

Keywords

model compression multi-task learning model merging task arithmetic large language model

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024