DEM: Distribution Edited Model for Training with Mixed Data Distributions

Dhananjay Ram; Aditya Rawal; Momchil Hardalov; Nikolaos Pappas; Sheng Zha

2024 EMNLP EMNLP 2024

DEM: Distribution Edited Model for Training with Mixed Data Distributions

Abstract

AbstractTraining with mixed data distributions is a common and important part of creating multi-task and instruction-following models. The diversity of the data distributions and cost of joint training makes the optimization procedure extremely challenging. Data mixing methods partially address this problem, albeit having a sub-optimal performance across data sources and require multiple expensive training runs. In this paper, we propose a simple and efficient alternative for better optimization of the data sources by combining models individually trained on each data source with the base model using basic element-wise vector operations. The resulting model, namely Distribution Edited Model (DEM), is cheaper than standard data mixing and outperforms strong baselines on a variety of benchmarks, yielding upto 6.2% improvement on MMLU, 11.5% on BBH, 16.1% on DROP, 6% MathQA and 9.3% on HELM with models of size 3B to 13B. Notably, DEM does not require full re-training when modifying a single data-source, thus making it very flexible and scalable for training with diverse data sources. The code is available at https://github.com/amazon-science/dem-distribution-edited-model.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — parameter composition

🐣 Hot Topic Early Bird — distribution alignment

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Dhananjay Ram , Aditya Rawal , Momchil Hardalov , Nikolaos Pappas , Sheng Zha

Topics

Artificial Intelligence > Core AI > Foundation Models Machine Learning > Application Areas > Model Merging Machine Learning > Learning Paradigms > Multi-Task Learning Deep Learning > Optimization & Theory > Model Compression Deep Learning > Techniques > Knowledge Distillation Deep Learning > Learning Types > Transfer Learning Artificial Intelligence > Core AI > Multi-Task Learning

Keywords

multi-task learning instruction following model merging distribution shift foundation model data distribution distribution alignment parameter composition element-wise vector operation element-wise operation

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024