Aurojit Panda ▪, Shivaram Venkataraman⬩, Mosharaf Chowdhury▴, Aditya Akella⬩, Scott Shenker ⋆, Ion Stoica ⋆ ⋆ UC Berkeley ▪ NYU ⬩ University of Wisconsin ▴ University of Michigan June 10, 2018
ID City Buff Ratio 1 NYC 0.78 2 NYC 0.13 3 Berkeley 0.25 4 NYC 0.19 5 NYC 0.11 6 Berkeley 0.09 7 NYC 0.18 8 NYC 0.15 9 Berkeley 0.13 10 Berkeley 0.49 11 NYC 0.19 12 Berkeley 0.10 What is the average buffering ratio in the table?
ID City Buff Ratio 1 NYC 0.78 2 NYC 0.13 3 Berkeley 0.25 4 NYC 0.19 5 NYC 0.11 6 Berkeley 0.09 7 NYC 0.18 8 NYC 0.15 9 Berkeley 0.13 10 Berkeley 0.49 11 NYC 0.19 12 Berkeley 0.10 What is the average buffering ratio in the table? 0.2325
Graphs” Sampling for Graph Approximation § Sparsiﬁcation extensively studied in graph theory § Idea: approximate the graph using a sparse, much smaller graph § Many computationally intensive § Not amenable to distributed implementation § Build on Spielman & Teng’s work* § Keep edges with probability cial properties of the input graph. While several proposals on the type of sparsier exists, many o m are either computationally intensive, or are not amenable to stributed implementation (which is the focus of our work)3. A nitial solution, we developed a simple sparsier adapted from work of Spielman and Teng [31] that is based on vertex degree sparsier uses the following probability to decide to keep an e between vertex a and b: dAV G ⇥ s min(d o a,d i b ) (1 o
Graphs” Sampling for Graph Approximation § Sparsification extensively studied in graph theory § Idea: approximate the graph using a sparse, much smaller graph § Many computationally intensive § Not amenable to distributed implementation § Build on Spielman & Teng’s work* § Keep edges with probability cial properties of the input graph. While several proposals on the type of sparsier exists, many o m are either computationally intensive, or are not amenable to stributed implementation (which is the focus of our work)3. A nitial solution, we developed a simple sparsier adapted from work of Spielman and Teng [31] that is based on vertex degree sparsier uses the following probability to decide to keep an e between vertex a and b: dAV G ⇥ s min(d o a,d i b ) (1 o Exploring other sparsification techniques
speedup due to sparsification? Many approaches in approximate processing literature: • Exhaustively run every possible point • Theoretical closed-bound solutions • Experiment design / Bayesian techniques
speedup due to sparsiﬁcation? Many approaches in approximate processing literature: • Exhaustively run every possible point • Theoretical closed-bound solutions • Experiment design / Bayesian techniques None applicable for graph approximation
Use machine learning to build a model. Learn the relation between s and error / latency “The most important determinant of graph workload characteristics is typically the input graph and surprisingly not the implementation or even the graph kernel.” Beamer et. al. Indistributed graph processing, communication (shuffles) dominate execution time.
Better sparsifiers § Can we cherry pick sparsifiers? § Programming Language techniques § Can we synthesize approximate versions of an exact graph-parallel program?
processing, no direct relation between graph size and latency/error. § Our proposal GAP: § Uses sparsification theory to reduce input to graph algorithms, and ML to learn the relation between input latency/error. § Initial results are encouraging. http://www.cs.berkeley.edu/~api api@cs.berkeley.edu