2022
ACL
ACL 2022
Scaling Language Model Size in Cross-Device Federated Learning
Abstract
AbstractMost studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a 21M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with ∼10× smaller client-to-server communication cost and 11% lower perplexity than smaller LSTMs commonly studied in literature.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Natural Language Processing
🧭
Keyword Pioneer
— partial model training
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Model Compression
Artificial Intelligence > Learning Paradigms > Federated Learning
Natural Language Processing > Generation > Language Modeling
Deep Learning > Learning Types > Transfer Learning
Deep Learning > Learning Types > Federated Learning
Deep Learning > Learning Types > Language Modeling