Papers
401 papers found
Inductive Invariants That Spark Joy: Using Invariant Taxonomies to Streamline Distributed Protocol Proofs
Tony Nuda Zhang, Travis Hance, Manos Kapritsos et al.
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management
Wonbeom Lee, Jungi Lee, Junghwan Seo et al.
IntOS: Persistent Embedded Operating System and Language Support for Multi-threaded Intermittent Computing
Yilun Wu, Byounguk Min, Mohannad Ismail et al.
IronSpec: Increasing the Reliability of Formal Specifications
Eli Goldweber, Weixin Yu, Seyed Armin Vakil Ghahani et al.
Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation
Lei Wang, Lingxiao Ma, Shijie Cao et al.
Llumnix: Dynamic Scheduling for Large Language Model Serving
Biao Sun, Ziming Huang, Hanyu Zhao et al.
Managing Memory Tiers with CXL in Virtualized Environments
Yuhong Zhong, Daniel S. Berger, Carl Waldspurger et al.
Massively Parallel Multi-Versioned Transaction Processing
Shujian Qian, Ashvin Goel
MAST: Global Scheduling of ML Training across Geo-Distributed Datacenters at Hyperscale
Arnab Choudhury, Yang Wang, Tuomas Pelkonen et al.
Microkernel Goes General: Performance and Compatibility in the HongMeng Production Microkernel
Haibo Chen, Xie Miao, Ning Jia et al.
MonoNN: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures
Donglin Zhuang, Zhen Zheng, Haojun Xia et al.
Motor: Enabling Multi-Versioning for Distributed Transactions on Disaggregated Memory
Ming Zhang, Yu Hua, Zhijun Yang
nnScaler: Constraint-Guided Parallelization Plan Generation for Deep Learning Training
Zhiqi Lin, Youshan Miao, Quanlu Zhang et al.
Nomad: Non-Exclusive Memory Tiering via Transactional Page Migration
Lingfeng Xiang, Zhen Lin, Weishu Deng et al.
Optimizing Resource Allocation in Hyperscale Datacenters: Scalability, Usability, and Experiences
Neeraj Kumar, Pol Mauri Ruiz, Vijay Menon et al.
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable
Chaofan Lin, Zhenhua Han, Chengruidong Zhang et al.
Performance Interfaces for Hardware Accelerators
Jiacheng Ma, Rishabh Iyer, Sahand Kashani et al.
Ransom Access Memories: Achieving Practical Ransomware Protection in Cloud with DeftPunk
Zhongyu Wang, Yaheng Song, Erci Xu et al.
Sabre: Hardware-Accelerated Snapshot Compression for Serverless MicroVMs
Nikita Lazarev, Varun Gohil, James Tsai et al.
Secret Key Recovery in a Global-Scale End-to-End Encryption System
Graeme Connell, Vivian Fang, Rolfe Schmidt et al.
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
Yao Fu, Leyang Xue, Yeqi Huang et al.
ServiceLab: Preventing Tiny Performance Regressions at Hyperscale through Pre-Production Testing
Mike Chow, Yang Wang, William Wang et al.
SquirrelFS: using the Rust compiler to check file-system crash consistency
Hayley LeBlanc, Nathan Taylor, James Bornholt et al.
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal, Nitin Kedia, Ashish Panwar et al.
USHER: Holistic Interference Avoidance for Resource Optimized ML Inference
Sudipta Saha Shubha, Haiying Shen, Anand Iyer