Flor: An Open High Performance RDMA Framework Over Heterogeneous RNICs
Abstract
Datacenter applications have been increasingly applying RDMA for the ultra-low latency and low CPU overhead. However, RDMA-capable NICs (RNICs) of different vendors and different generations from the same vendors do not cooperate well, which causes bandwidth imbalance in the production network. Our observation of the heterogeneous RNICs is that though the data path functions of these RNICs follow the same RoCEv2 specifications, their control path functions are vendor and version specific. To this end, we propose Flor, an open framework that provides a flexible control plane in software and a unified hardware plane by adopting heterogeneous RNICs. The hardware plane requires no changes of current specifications. The software plane can run in NPU of RNICs, DPUs and host CPUs, following which we build up strengthen reliable transport over the large-scale lossy Ethernet. We implemented and evaluated Flor in both testbed and production clusters over Intel E180, Mellanox CX-4 and CX-5 and Broadcom RNICs. Experiments show that Flor achieves comparable performance to vanilla RDMA in many scenarios including 1/4096 packet loss, 6000:1 incast, and large-scale cross-pod communication. Flor mitigates the performance gap of CX-4 and CX-5 RNICs from 24.3% to 1.3% when they are deployed together without PFC dependency.