Speeding Up Transformer Training By Using Dataset Subsampling - An Exploratory Analysis

Lovre Torbarina; Velimir Mihelčić; Bruno Šarlija; Lukasz Roguski; Zeljko Kraljevic

2021 EMNLP EMNLP 2021

Speeding Up Transformer Training By Using Dataset Subsampling - An Exploratory Analysis

Abstract

AbstractTransformer-based models have greatly advanced the progress in the field of the natural language processing and while they achieve state-of-the-art results on a wide range of tasks, they are cumbersome in parameter size. Subsequently, even when pre-trained transformer models are used for fine-tuning on a given task, if the dataset is large, it may still not be feasible to fine-tune the model within a reasonable time. For this reason, we empirically test 8 subsampling methods for reducing the dataset size on text classification task and report the trade-off between metric score and training time. 7 out of 8 methods are simple methods, while the last one is CRAIG, a method for coreset construction for data-efficient model training. We obtain the best result with the CRAIG method, offering an average decrease of 0.03 points in f-score on test set while speeding up the training time on average by 63.93%, relative to the score and time obtained by using the full dataset. Lastly, we show the trade-off between speed and performance for all sampling methods on three different datasets.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — dataset subsampling

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lovre Torbarina , Velimir Mihelčić , Bruno Šarlija , Lukasz Roguski , Zeljko Kraljevic

Topics

Deep Learning > Architectures > Transformers Natural Language Processing > Generation > Language Modeling Natural Language Processing > Resources & Methods > Large Language Models Machine Learning > Core Methods > Feature Selection Machine Learning > Optimization & Theory > Stochastic Methods Machine Learning > Learning Types > Deep Learning Deep Learning > Optimization & Theory > Optimization Deep Learning > Learning Types > Transfer Learning Deep Learning > Optimization & Theory > Efficient Computing

Keywords

text classification efficient computing coreset construction sampling method transformer training training time reduction transformer model dataset subsampling

Download PDF

Related papers

Continual Learning in Multilingual NMT via Language-Specific Embeddings 2021

MultiDoc2Dial: Modeling Dialogues Grounded in Multiple Documents 2021

Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity 2021

Neural Machine Translation with Heterogeneous Topic Knowledge Embeddings 2021

Semantics-Preserved Data Augmentation for Aspect-Based Sentiment Analysis 2021