Papers
617 papers found
From Address Blocks to Authorized Prefixes: Redesigning RPKI ROV with a Hierarchical Hashing Scheme for Fast and Memory-Efficient Validation
Zedong Ni, Yinbo Xu, Hui Zou et al.
GPU-Disaggregated Serving for Deep Learning Recommendation Models at Scale
Lingyun Yang, Yongchen Wang, Yinghao Yu et al.
GRANNY: Granular Management of Compute-Intensive Applications in the Cloud
Carlos Segarra, Simon Shillaker, Guo Li et al.
GREEN: Carbon-efficient Resource Scheduling for Machine Learning Clusters
Kaiqiang Xu, Decang Sun, Han Tian et al.
HA/TCP: A Reliable and Scalable Framework for TCP Network Functions
Haoyu Gu, Ali José Mashtizadeh, Bernard Wong
High-level Programming for Application Networks
Xiangfeng Zhu, Yuyao Wang, Banruo Liu et al.
Holmes: Localizing Irregularities in LLM Training with Mega-scale GPU Clusters
Zhiyi Yao, Pengbo Hu, Congcong Miao et al.
Ladder: A Convergence-based Structured DAG Blockchain for High Throughput and Low Latency
Dengcheng Hu, Jianrong Wang, Xiulong Liu et al.
Large Network UWB Localization: Algorithms and Implementation
Nakul Garg, Irtaza Shahid, Ramanujan K Sheshadri et al.
Learning Production-Optimized Congestion Control Selection for Alibaba Cloud CDN
Xuan Zeng, Haoran Xu, Chen Chen et al.
Learnings from Deploying Network QoS Alignment to Application Priorities for Storage Services
Matthew Buckley, Parsa Pazhooheshy, Z. Morley Mao et al.
Making Serverless Pay-For-Use a Reality with Leopard
Tingjia Cao, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau et al.
MeshTest: End-to-End Testing for Service Mesh Traffic Management
Naiqian Zheng, Tianshuo Qiao, Xuanzhe Liu et al.
Minder: Faulty Machine Detection for Large-scale Distributed Model Training
Yangtao Deng, Xiang Shi, Zhuo Jiang et al.
Mitigating Scalability Walls of RDMA-based Container Networks
Wei Liu, Kun Qian, Zhenhua Li et al.
Mowgli: Passively Learned Rate Control for Real-Time Video
Neil Agarwal, Rui Pan, Francis Y. Yan et al.
MTP: Transport for In-Network Computing
Tao Ji, Rohan Vardekar, Balajee Vamanan et al.
Mutant: Learning Congestion Control from Existing Protocols via Online Reinforcement Learning
Lorenzo Pappone, Alessio Sacco, Flavio Esposito
NDD: A Decision Diagram for Network Verification
Zechun Li, Peng Zhang, Yichi Zhang et al.
ODRP: On-Demand Remote Paging with Programmable RDMA
Zixuan Wang, Xingda Wei, Jinyu Gu et al.
ONCache: A Cache-Based Low-Overhead Container Overlay Network
Shengkai Lin, Shizhen Zhao, Peirui Cao et al.
One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems
Ruiming Lu, Yunchi Lu, Yuxuan Jiang et al.
On Temporal Verification of Stateful P4 Programs
Delong Zhang, Chong Ye, Fei He
Optimizing RLHF Training for Large Language Models with Stage Fusion
Yinmin Zhong, Zili Zhang, Bingyang Wu et al.
OptiReduce: Resilient and Tail-Optimal AllReduce for Distributed Deep Learning in the Cloud
Ertza Warraich, Omer Shabtai, Khalid Manaa et al.