2024 NSDI NSDI 2024

Crescent: Emulating Heterogeneous Production Network at Scale

Abstract

This paper presents the design, implementation, evaluation, and deployment of Crescent, ByteDance’s network emulation platform, for preventing change-induced network incidents. Inspired by prior art such as CrystalNet, Crescent achieves high fidelity by running switch vendor images inside containers. But, we explore a different route to scaling up the emulator with unique challenges. First, we analyze our past network incidents to reveal the difficulty in identifying a safe emulation boundary. Instead of emulating the entire network, we exploit the inherent symmetry and modularity of data center network architectures to strike a balance between coverage and resource cost. Second, we study the node-to-host assignment by formulating it as a graph partitioning problem. Evaluation results show that our partitioning algorithm reduces the testbed bootup time by up to 20× compared with random partitioning. Third, we developed an incremental approach to modify the emulated network on the fly. This approach can be 30× faster than creating a new testbed of the same scale. Crescent has been actively used for three and a half years, which led to a significant reduction in change-induced network incidents. We also share Crescent’s success in many other use cases and the critical lessons learned from its deployment.

🧭 Keyword Pioneer — testbed deployment
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization