⋆, Zaoxing Liu ⬩, Xin Jin⬩, Shivaram Venkataraman✢, Vladimir Braverman⬩, Ion Stoica ⋆ ⋆UC Berkeley ⬩Johns Hopkins University ✢University of Wisconsin & Microsoft OSDI, October 10, 2018
Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Also popular in traditional enterprises* P P P P P
Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* P P P P P D D D D D W W D
(classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* P P P P P D D D D D W W D
(classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P D D D D D W W D
(classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P P P D D D D D W W D
(classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P P P D D D D D W W D
the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2 Applying exact algorithm on sampled graph(s) not the right approach for pattern mining
API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
(0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges ASAP computes the right expectations, runs many instances of the estimator and aggregates results !" = 40
(0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 See paper for more examples & proof Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges ASAP computes the right expectations, runs many instances of the estimator and aggregates results !" = 40
1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators ASAP sets a profiling cost and picks maximum points within the budget
1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) Twitter Graph Profiling
25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators Key idea: Use a very small sample of the graph to build the ELP § Chernoff analysis provides a loose upper bound on the number of estimators. § In small graphs, a large number of estimators can get us very close to ground truth.
of type “electronics” • ASAP allows simple edge and vertex predicates Motif Mining • Some patterns are building blocks for other patterns • ASAP caches state of the estimators and reuses them Accuracy Refinement • Users may require more accurate answer later • ASAP can checkpoint and reuse estimators More details in the paper
in many domains § ASAP uses approximation for fast pattern mining § Leverages graph mining theory & makes it practical § Simple API for developers § ASAP outperforms existing solutions § Can handle much larger graphs with fewer resources http://www.cs.berkeley.edu/~api api@cs.berkeley.edu