Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ASAP: Fast, Approximate Graph Pattern Mining at Scale

Anand Iyer
October 10, 2018

ASAP: Fast, Approximate Graph Pattern Mining at Scale

Anand Iyer

October 10, 2018
Tweet

More Decks by Anand Iyer

Other Decks in Research

Transcript

  1. ASAP: Fast, Approximate Graph Pattern Mining at Scale Anand Iyer

    ⋆, Zaoxing Liu ⬩, Xin Jin⬩, Shivaram Venkataraman✢, Vladimir Braverman⬩, Ion Stoica ⋆ ⋆UC Berkeley ⬩Johns Hopkins University ✢University of Wisconsin & Microsoft OSDI, October 10, 2018
  2. Graphs popular in big data analytics Metabolic network of a

    single cell organism Social networks Tuberculosis
  3. Graphs popular in big data analytics *“The Ubiquity of Large

    Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Also popular in traditional enterprises*
  4. Graphs popular in big data analytics Products and customers *“The

    Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Also popular in traditional enterprises* P P P P P
  5. Graphs popular in big data analytics Products and customers *“The

    Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* P P P P P D D D D D W W D
  6. Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* P P P P P D D D D D W W D
  7. Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P D D D D D W W D
  8. Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P P P D D D D D W W D
  9. Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P P P D D D D D W W D
  10. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs Standard approach: Iterative expansion
  11. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs 0 1 2 3 Standard approach: Iterative expansion
  12. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs 0 1 2 3 0 1 2 3 Standard approach: Iterative expansion
  13. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion
  14. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs Huge intermediate data Quickly intractable in large graphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion
  15. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs Huge intermediate data Quickly intractable in large graphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion Challenging to mine patterns in large graphs
  16. Graph Pattern Mining ~1 billion 11 hours Motifs with size

    = 3 Log scale # Edges Computation Time Arabesque (SOSP ‘15)
  17. Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15)
  18. Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster
  19. Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster 5x less CPU & Memory
  20. Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster 5x less CPU & Memory <5% error
  21. General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph 0 1 4 2 3
  22. General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting
  23. General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2
  24. Answer: 10 General approach: Apply algorithm on subset(s) (sample) of

    the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2
  25. Answer: 10 General approach: Apply algorithm on subset(s) (sample) of

    the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 0 2 4 6 8 10 12 Error (%) Speedup Edges Dropped (%) Error Speedup
  26. Answer: 10 General approach: Apply algorithm on subset(s) (sample) of

    the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2 Applying exact algorithm on sampled graph(s) not the right approach for pattern mining
  27. Graph Pattern Mining Theory 0 1 4 2 3 graph

    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  28. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  29. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  30. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  31. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  32. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  33. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  34. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  35. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  36. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  37. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  38. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  39. ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  40. ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 Sample instances of the pattern from the graph stream
  41. ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 Sample instances of the pattern from the graph stream
  42. ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 result 1 ) * +,( -./ '+ = 10 '/ = 0 '0 = 0 '1 = 0 Sample instances of the pattern from the graph stream
  43. ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 estimator (r=4) neighborhood sampling graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 result 1 ) * +,( -./ '+ = 10 '/ = 0 '0 = 0 '1 = 0 Pavan et al. Counting and sampling triangles from a graph stream, VLDB 2013 Sample instances of the pattern from the graph stream
  44. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  45. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  46. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  47. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  48. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  49. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 7 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  50. Graph updates … … Graphs stored on disk or main

    memory graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 7 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  51. Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2
  52. Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2 Contributions: • Extends neighborhood sampling to general patterns • Provides a unified API • Applies approximate pattern mining in distributed settings
  53. Generalized Approximate Pattern Mining Under submission. Please do not distribute.

    API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
  54. Generalized Approximate Pattern Mining Under submission. Please do not distribute.

    API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
  55. Generalized Approximate Pattern Mining Under submission. Please do not distribute.

    API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
  56. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  57. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  58. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  59. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  60. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3
  61. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3
  62. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3
  63. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  64. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  65. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  66. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  67. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  68. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  69. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  70. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  71. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  72. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  73. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  74. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  75. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  76. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 !" = 40
  77. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges !" = 40
  78. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges ASAP computes the right expectations, runs many instances of the estimator and aggregates results !" = 40
  79. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 See paper for more examples & proof Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges ASAP computes the right expectations, runs many instances of the estimator and aggregates results !" = 40
  80. Applying to Distributed Settings graph subgraph 0 partial count c0

    (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  81. Applying to Distributed Settings graph map: w(=3) workers subgraph 0

    partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  82. Applying to Distributed Settings graph map: w(=3) workers subgraph 0

    partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  83. Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  84. Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  85. Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) Random Vertex-cut Partitioning
  86. Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) )(+) Random Vertex-cut Partitioning
  87. Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) Upper bounds on f(w) can be proved using Hajnal-Szemerédi theorem )(+) Random Vertex-cut Partitioning
  88. Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2
  89. Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2 Contribution: • Novel way to build ELP very fast without the need to know the ground truth or running mining on the full graph.
  90. Building Error-Latency Profile Given a time / error bound, how

    many estimators should ASAP use? Number of estimators Time Time vs Estimators
  91. Building Error-Latency Profile Given a time / error bound, how

    many estimators should ASAP use? Number of estimators Time Time vs Estimators Error Number of estimators Error vs Estimators
  92. Building Estimators vs Time Profile 1 2 3 0.5M 1M

    1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators
  93. Building Estimators vs Time Profile 1 2 3 0.5M 1M

    1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators ASAP sets a profiling cost and picks maximum points within the budget
  94. Building Estimators vs Time Profile 1 2 3 0.5M 1M

    1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) Twitter Graph Profiling
  95. Building Estimators vs Error Profile 0 5 10 15 20

    25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators
  96. Building Estimators vs Error Profile 0 5 10 15 20

    25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators Key idea: Use a very small sample of the graph to build the ELP § Chernoff analysis provides a loose upper bound on the number of estimators. § In small graphs, a large number of estimators can get us very close to ground truth.
  97. Building Estimators vs Error Profile 0 5 10 15 20

    25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling
  98. Advanced Mining Predicate Matching • Find patterns where vertices are

    of type “electronics” • ASAP allows simple edge and vertex predicates Motif Mining • Some patterns are building blocks for other patterns • ASAP caches state of the estimators and reuses them Accuracy Refinement • Users may require more accurate answer later • ASAP can checkpoint and reuse estimators More details in the paper
  99. Implementation & Evaluation § Implemented on Apache Spark § Not

    limited to it, only relies on simple dataflow operators § Evaluated in a 16 node cluster § Twitter: 1.47B edges § Friendster: 1.8B edges § UK: 3.73B edges § Comparison using representative patterns: § 3 (2 patterns), 4 (6 patterns) and 5 motifs (21 patterns)
  100. Performance on Small Graphs 12.1 162 291.4 3161 7.3 14.9

    18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)
  101. Performance on Small Graphs 12.1 162 291.4 3161 7.3 14.9

    18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)
  102. Performance on Small Graphs 77 x <5% error 12.1 162

    291.4 3161 7.3 14.9 18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)
  103. Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) 3-Motifs (2 patterns) Arabesque ASAP
  104. Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 3-Motifs (2 patterns) Arabesque ASAP
  105. Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 258 x <5% error 3-Motifs (2 patterns) Twitter Friendster UK Arabesque ASAP
  106. Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 258 x <5% error 3-Motifs (2 patterns) Twitter Friendster UK Arabesque ASAP
  107. Large Graphs & Complex Patterns 4-Motifs 22 47 0 10

    20 30 40 50 Twitter UK Time (min)
  108. Large Graphs & Complex Patterns 12.3 22.1 5.6 14.2 0

    5 10 15 20 25 Twitter UK Time (min) 5% 10% 5-House 4-Motifs 22 47 0 10 20 30 40 50 Twitter UK Time (min)
  109. Summary § Pattern mining important & challenging problem § Applications

    in many domains § ASAP uses approximation for fast pattern mining § Leverages graph mining theory & makes it practical § Simple API for developers § ASAP outperforms existing solutions § Can handle much larger graphs with fewer resources http://www.cs.berkeley.edu/~api [email protected]