2021 NSDI NSDI 2021

Whiz: Data-Driven Analytics Execution

Abstract

Today's data analytics frameworks are compute-centric, with analytics execution almost entirely dependent on the predetermined physical structure of the high-level computation. Relegating intermediate data to a second class entity in this manner hurts flexibility, performance, and efficiency. We present Whiz, a new analytics execution framework that cleanly separates computation from intermediate data. This enables runtime visibility into intermediate data via programmable monitoring, and data-driven computation where data properties drive when/what computation runs. Experiments with a Whiz prototype on a 50-node cluster using batch, streaming, and graph analytics workloads show that it improves analytics completion times 1.3-2x and cluster efficiency 1.4x compared to state-of-the-art.

🧭 Keyword Pioneer — data-driven analytics
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics