How to Train BERT with an Academic Budget

Peter Izsak; Moshe Berchansky; Omer Levy

2021 EMNLP EMNLP 2021

How to Train BERT with an Academic Budget

Abstract

AbstractWhile large language models a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce models that are competitive with BERT-base on GLUE tasks at a fraction of the original pretraining cost.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — language model pretraining

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Peter Izsak , Moshe Berchansky , Omer Levy

Topics

Machine Learning > Application Areas > Efficient Computing Natural Language Processing > Applications > Machine Translation Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Learning Types > Transfer Learning Deep Learning > Models > Large Language Models Deep Learning > Optimization & Theory > Model Compression Deep Learning > Learning Types > Transfer Learning Deep Learning > Optimization & Theory > Efficient Computing

Keywords

transfer learning knowledge distillation masked language model computational efficiency hyperparameter tuning model efficiency language model pre-training language model pretraining bert pretraining

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021