2026 AAAI AAAI 2026

BitDP: Ultra-low-bit Communication for Data Parallelism in LLM Training

Abstract

Abstract Training large language models (LLMs) with billions of parameters on trillion-token datasets requires distributed data parallelism at increasingly large scales, where gradient synchronization becomes a communication bottleneck, especially in bandwidth-constrained environments. Although gradient quantization presents a promising solution, it faces two key challenges: maintaining training stability and accuracy for transformer architectures and adapting to modern distributed communication systems. In this paper, we propose BitDP, an ultra-low-bit gradient quantization system that reduces communication costs by up to 32× while preserving model accuracy with less than 1% performance degradation. Our approach achieves numerical stability for large transformer models and seamlessly integrates with existing infrastructures. We evaluate BitDP's effectiveness across various LLM sizes, architectures and optimizers. The results demonstrate significant training efficiency improvements while maintaining convergence quality, establishing BitDP as a scalable and reliable solution for real-world LLM training at industrial scales.

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio