2009 NIPS NeurIPS 2009

Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models

Abstract

Training conditional maximum entropy models on massive data requires significant time and computational resources. In this paper, we investigate three common distributed training strategies: distributed gradient, majority voting ensembles, and parameter mixtures. We analyze the worst-case runtime and resource costs of each and present a theoretical foundation for the convergence of parameters under parameter mixtures, the most efficient strategy. We present large-scale experiments comparing the different strategies and demonstrate that parameter mixtures over independent models use fewer resources and achieve comparable loss as compared to standard approaches.

📈 Trend Setter — Distributed Learning
🧭 Keyword Pioneer — ensemble training
🐣 Hot Topic Early Bird — gradient descent
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning
🌉 Interdisciplinary Bridge — Machine Learning and Mathematics & Optimization