Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

Yubin Shi; Yixuan Chen; MINGZHI DONG; Xiaochen Yang; Dongsheng Li; Yujiang Wang; Robert Dick; Qin Lv; Yingying Zhao; Fan Yang; Tun Lu; Ning Gu; Li Shang

2023 NIPS NeurIPS 2023

Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

Abstract

Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-attention models, we can observe varying learning patterns implicitly associated with each module's trainability. To describe such modular-level learning capabilities, we introduce a novel concept dubbed modular neural tangent kernel (mNTK), and we demonstrate that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue $\lambda_{\max}$. A large $\lambda_{\max}$ indicates that the module learns features with better convergence, while those miniature ones may impact generalization negatively. Inspired by the discovery, we propose a novel training strategy termed Modular Adaptive Training (MAT) to update those modules with their $\lambda_{\max}$ exceeding a dynamic threshold selectively, concentrating the model on learning common features and ignoring those inconsistent ones. Unlike most existing training schemes with a complete BP cycle across all network modules, MAT can significantly save computations by its partially-updating strategy and can further improve performance. Experiments show that MAT nearly halves the computational cost of model training and outperforms the accuracy of baselines.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🧭 Keyword Pioneer — modular adaptive training

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yubin Shi , Yixuan Chen , MINGZHI DONG , Xiaochen Yang , Dongsheng Li , Yujiang Wang , Robert Dick , Qin Lv , Yingying Zhao , Fan Yang , Tun Lu , Ning Gu , Li Shang

Topics

Machine Learning > Optimization & Theory > Neural Network Optimization Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Model Architecture Deep Learning > Optimization & Theory > Efficient Computing

Keywords

neural tangent kernel model architecture neural network optimization efficient computing computational efficiency over-parameterized model selective update modular training modular adaptive training partial backpropagation

Download PDF

Related papers

Risk-Averse Model Uncertainty for Distributionally Robust Safe Reinforcement Learning 2023

Generative Modeling through the Semi-dual Formulation of Unbalanced Optimal Transport 2023

Self-Supervised Motion Magnification by Backpropagating Through Optical Flow 2023

Diffused Task-Agnostic Milestone Planner 2023

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond 2023