Hyperparameter Power Impact in Transformer Language Model Training

Lucas Høyberg Puvis de Chavannes; Mads Guldborg Kjeldgaard Kongsbak; Timmie Rantzau; Leon Derczynski

2021 EMNLP EMNLP 2021

Hyperparameter Power Impact in Transformer Language Model Training

Abstract

AbstractTraining large language models can consume a large amount of energy. We hypothesize that the language model’s configuration impacts its energy consumption, and that there is room for power consumption optimisation in modern large language models. To investigate these claims, we introduce a power consumption factor to the objective function, and explore the range of models and hyperparameter configurations that affect power. We identify multiple configuration factors that can reduce power consumption during language model training while retaining model quality.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lucas Høyberg Puvis de Chavannes , Mads Guldborg Kjeldgaard Kongsbak , Timmie Rantzau , Leon Derczynski

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Application Areas > Efficient Computing Deep Learning > Architectures > Transformers Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models Deep Learning > Optimization & Theory > Optimization Deep Learning > Optimization & Theory > Efficient Computing Deep Learning > Application Areas > Efficient Computing Machine Learning > Learning Types > Efficient Computing

Keywords

hyperparameter optimization language model training energy efficiency power consumption training efficiency transformer language model energy consumption model configuration large language model

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021