Papers
401 papers found
Thunderbolt: Throughput-Optimized, Quality-of-Service-Aware Power Capping at Scale
Shaohong Li, Xi Wang, Xiao Zhang et al.
Tolerating Slowdowns in Replicated State Machines using Copilots
Khiem Ngo, Siddhartha Sen, Wyatt Lloyd
Toward a Generic Fault Tolerance Technique for Partial Network Partitioning
Mohammed Alfatafta, Basil Alkhatib, Ahmed Alquraan et al.
Twine: A Unified Cluster Management System for Shared Infrastructure
Chunqiang Tang, Kenny Yu, Kaushik Veeraraghavan et al.
Unearthing inter-job dependencies for better cluster scheduling
Andrew Chung, Subru Krishnan, Konstantinos Karanasos et al.
Virtual Consensus in Delos
Mahesh Balakrishnan, Jason Flinn, Chen Shen et al.
Write Dependency Disentanglement with HORAE
Xiaojian Liao, Youyou Lu, Erci Xu et al.
Adaptive Dynamic Checkpointing for Safe Efficient Intermittent Computing
Kiwan Maeng, Brandon Lucia
An Analysis of Network-Partitioning Failures in Cloud Systems
Ahmed Alquraan, Hatem Takruri, Mohammed Alfatafta et al.
Arachne: Core-Aware Thread Management
Henry Qin, Qian Li, Jacqueline Speiser et al.
ASAP: Fast, Approximate Graph Pattern Mining at Scale
Anand Padmanabha Iyer, Zaoxing Liu, Xin Jin et al.
Capturing and Enhancing In Situ System Observability for Failure Detection
Peng Huang, Chuanxiong Guo, Jacob R. Lorch et al.
Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!
Xingda Wei, Zhiyuan Dong, Rong Chen et al.
Differential Energy Profiling: Energy Optimization via Diffing Similar Apps
Abhilash Jindal, Y. Charlie Hu
Dynamic Query Re-Planning using QOOP
Kshiteej Mahajan, Mosharaf Chowdhury, Aditya Akella et al.
Fault-Tolerance, Fast and Slow: Exploiting Failure Asynchrony in Distributed Systems
Ramnatthan Alagappan, Aishwarya Ganesan, Jing Liu et al.
Finding Crash-Consistency Bugs with Bounded Black-Box Crash Testing
Jayashree Mohan, Ashlie Martinez, Soujanya Ponnapalli et al.
Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data
Gregory Essertel, Ruby Tahboub, James Decker et al.
FlashShare: Punching Through Server Storage Stack from Kernel to Firmware for Ultra-Low Latency SSDs
Jie Zhang, Miryeong Kwon, Donghyun Gouk et al.
Floem: A Programming System for NIC-Accelerated Network Applications
Phitchaya Mangpo Phothilimthana, Ming Liu, Antoine Kaufmann et al.
Focus: Querying Large Video Datasets with Low Latency and Low Cost
Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik et al.
Gandiva: Introspective Cluster Scheduling for Deep Learning
Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee et al.
Graviton: Trusted Execution Environments on GPUs
Stavros Volos, Kapil Vaswani, Rodrigo Bruno
Karaoke: Distributed Private Messaging Immune to Passive Traffic Analysis
David Lazar, Yossi Gilad, Nickolai Zeldovich
LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation
Yizhou Shan, Yutong Huang, Yilun Chen et al.