2024 EMNLP EMNLP 2024

Breaking ReLU Barrier: Generalized MoEfication for Dense Pretrained Models

Abstract

AbstractAs the scale of language models (LMs) continues to grow, there is a heightened interest in reducing the inference cost associated with these models. Mixture-of-Experts (MoEs) present an efficient alternative to dense models, while the existing methods to convert pretrained dense models to MoEs is limited to ReLU-based models with natural sparsity. This paper introduces G-MoEfication, applicable to arbitrary dense models, where ReLU-based activation sparsity assumptions no longer hold. For generalizations, we encounter the dilemma of needing to zero-out deactivated experts, while also avoiding excessive zeroing-out to retain dense activation information. We publicly release our code and report results conducted with mBERT, SantaCoder-1.1B, Phi-2-2.7B, and Falcon-7B demonstrating the efficacy of our approach in general scenarios: from multitask to multilingual, from fine-tuning to zero-shot evaluation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio