2016 OSDI OSDI 2016

Exploring the Hidden Dimension in Graph Processing

Abstract

Task partitioning of a graph-parallel system is traditionally considered equivalent to the graph partition problem. Such equivalence exists because the properties associated with each vertex/edge are normally considered indivisible. However, this assumption is not true for many Machine Learning and Data Mining (MLDM) problems: instead of a single value, a vector of data elements is defined as the property for each vertex/edge. This feature opens a new dimension for task partitioning because a vertex could be divided and assigned to different nodes. To explore this new opportunity, this paper presents 3D partitioning, a novel category of task partition algorithms that significantly reduces network traffic for certain MLDM applications. Based on 3D partitioning, we build a distributed graph engine CUBE. Our evaluation results show that CUBE outperforms state-of-the-art graph-parallel system PowerLyra by up to 4:7x (up to 7:3x speedup against PowerGraph).

🌉 Interdisciplinary Bridge — Computer Science and Machine Learning and Mathematics & Optimization
📈 Trend Setter — Distributed Systems
🧭 Keyword Pioneer — task partitioning
🐝 Cross-Pollinator — Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Machine Learning, Mathematics & Optimization