2024
EMNLP
EMNLP 2024
Stable Language Model Pre-training by Reducing Embedding Variability
Abstract
AbstractStable pre-training is essential for achieving better-performing language models. However, tracking pre-training stability is impractical due to high computational costs. We study Token Embedding Variability as a simple proxy to estimate pre-training stability. We theoretically and empirically demonstrate that Multi-head Low-Rank Attention acts as a fundamental approach to reducing instability. This is supported by empirical findings on variants on GPT-2, demonstrating improved stability and lower perplexities, even at deeper layer counts.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— pre-training stability
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Foundation Models
Machine Learning > Optimization & Theory > Neural Network Optimization
Machine Learning > Optimization & Theory > Optimization
Deep Learning > Architectures > Transformers
Natural Language Processing > Generation > Language Modeling
Artificial Intelligence > Core AI > Large Language Models
Deep Learning > Models > Large Language Models
Deep Learning > Optimization & Theory > Neural Network Optimization
Deep Learning > Optimization & Theory > Optimization
Deep Learning > Models > Language Models