ASAP: Fast, Approximate Graph Pattern Mining at Scale

0ff46442256bf55681d64027c68beea7?s=47 Anand Iyer
October 10, 2018

ASAP: Fast, Approximate Graph Pattern Mining at Scale

0ff46442256bf55681d64027c68beea7?s=128

Anand Iyer

October 10, 2018
Tweet

Transcript

  1. 1.

    ASAP: Fast, Approximate Graph Pattern Mining at Scale Anand Iyer

    ⋆, Zaoxing Liu ⬩, Xin Jin⬩, Shivaram Venkataraman✢, Vladimir Braverman⬩, Ion Stoica ⋆ ⋆UC Berkeley ⬩Johns Hopkins University ✢University of Wisconsin & Microsoft OSDI, October 10, 2018
  2. 3.
  3. 4.

    Graphs popular in big data analytics Metabolic network of a

    single cell organism Social networks Tuberculosis
  4. 5.

    Graphs popular in big data analytics *“The Ubiquity of Large

    Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Also popular in traditional enterprises*
  5. 6.

    Graphs popular in big data analytics Products and customers *“The

    Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Also popular in traditional enterprises* P P P P P
  6. 7.

    Graphs popular in big data analytics Products and customers *“The

    Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* P P P P P D D D D D W W D
  7. 8.

    Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* P P P P P D D D D D W W D
  8. 9.

    Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P D D D D D W W D
  9. 10.

    Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P P P D D D D D W W D
  10. 11.

    Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P P P D D D D D W W D
  11. 14.

    Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs Standard approach: Iterative expansion
  12. 15.

    Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs 0 1 2 3 Standard approach: Iterative expansion
  13. 16.

    Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs 0 1 2 3 0 1 2 3 Standard approach: Iterative expansion
  14. 17.

    Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion
  15. 18.

    Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs Huge intermediate data Quickly intractable in large graphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion
  16. 19.

    Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs Huge intermediate data Quickly intractable in large graphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion Challenging to mine patterns in large graphs
  17. 22.

    Graph Pattern Mining ~1 billion 11 hours Motifs with size

    = 3 Log scale # Edges Computation Time Arabesque (SOSP ‘15)
  18. 23.

    Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15)
  19. 24.

    Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster
  20. 25.

    Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster 5x less CPU & Memory
  21. 26.

    Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster 5x less CPU & Memory <5% error
  22. 30.
  23. 31.

    General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph 0 1 4 2 3
  24. 32.

    General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting
  25. 33.

    General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2
  26. 34.

    Answer: 10 General approach: Apply algorithm on subset(s) (sample) of

    the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2
  27. 35.

    Answer: 10 General approach: Apply algorithm on subset(s) (sample) of

    the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 0 2 4 6 8 10 12 Error (%) Speedup Edges Dropped (%) Error Speedup
  28. 36.

    Answer: 10 General approach: Apply algorithm on subset(s) (sample) of

    the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2 Applying exact algorithm on sampled graph(s) not the right approach for pattern mining
  29. 39.

    Graph Pattern Mining Theory 0 1 4 2 3 graph

    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  30. 40.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  31. 41.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  32. 42.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  33. 43.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  34. 44.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  35. 45.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  36. 46.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  37. 47.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  38. 48.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  39. 49.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  40. 50.

    E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  41. 51.

    ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  42. 52.

    ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 Sample instances of the pattern from the graph stream
  43. 53.

    ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 Sample instances of the pattern from the graph stream
  44. 54.

    ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 result 1 ) * +,( -./ '+ = 10 '/ = 0 '0 = 0 '1 = 0 Sample instances of the pattern from the graph stream
  45. 55.

    ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 estimator (r=4) neighborhood sampling graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 result 1 ) * +,( -./ '+ = 10 '/ = 0 '0 = 0 '1 = 0 Pavan et al. Counting and sampling triangles from a graph stream, VLDB 2013 Sample instances of the pattern from the graph stream
  46. 58.

    … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  47. 59.

    … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  48. 60.

    … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  49. 61.

    … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  50. 62.

    … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  51. 63.

    … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 7 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  52. 64.

    Graph updates … … Graphs stored on disk or main

    memory graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 7 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  53. 65.

    Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2
  54. 66.

    Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2 Contributions: • Extends neighborhood sampling to general patterns • Provides a unified API • Applies approximate pattern mining in distributed settings
  55. 67.

    Generalized Approximate Pattern Mining Under submission. Please do not distribute.

    API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
  56. 68.

    Generalized Approximate Pattern Mining Under submission. Please do not distribute.

    API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
  57. 69.

    Generalized Approximate Pattern Mining Under submission. Please do not distribute.

    API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
  58. 70.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  59. 71.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  60. 72.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  61. 73.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  62. 74.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3
  63. 75.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3
  64. 76.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3
  65. 77.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  66. 78.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  67. 79.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  68. 80.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  69. 81.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  70. 82.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  71. 83.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  72. 84.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  73. 85.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  74. 86.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  75. 87.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  76. 88.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  77. 89.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  78. 90.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 !" = 40
  79. 91.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges !" = 40
  80. 92.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges ASAP computes the right expectations, runs many instances of the estimator and aggregates results !" = 40
  81. 93.

    Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 See paper for more examples & proof Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges ASAP computes the right expectations, runs many instances of the estimator and aggregates results !" = 40
  82. 95.

    Applying to Distributed Settings graph subgraph 0 partial count c0

    (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  83. 96.

    Applying to Distributed Settings graph map: w(=3) workers subgraph 0

    partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  84. 97.

    Applying to Distributed Settings graph map: w(=3) workers subgraph 0

    partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  85. 98.

    Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  86. 99.

    Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  87. 100.

    Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) Random Vertex-cut Partitioning
  88. 101.

    Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) )(+) Random Vertex-cut Partitioning
  89. 102.

    Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) Upper bounds on f(w) can be proved using Hajnal-Szemerédi theorem )(+) Random Vertex-cut Partitioning
  90. 103.

    Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2
  91. 104.

    Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2 Contribution: • Novel way to build ELP very fast without the need to know the ground truth or running mining on the full graph.
  92. 106.

    Building Error-Latency Profile Given a time / error bound, how

    many estimators should ASAP use? Number of estimators Time Time vs Estimators
  93. 107.

    Building Error-Latency Profile Given a time / error bound, how

    many estimators should ASAP use? Number of estimators Time Time vs Estimators Error Number of estimators Error vs Estimators
  94. 109.

    Building Estimators vs Time Profile 1 2 3 0.5M 1M

    1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators
  95. 110.

    Building Estimators vs Time Profile 1 2 3 0.5M 1M

    1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators ASAP sets a profiling cost and picks maximum points within the budget
  96. 111.

    Building Estimators vs Time Profile 1 2 3 0.5M 1M

    1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) Twitter Graph Profiling
  97. 112.

    Building Estimators vs Error Profile 0 5 10 15 20

    25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators
  98. 113.

    Building Estimators vs Error Profile 0 5 10 15 20

    25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators Key idea: Use a very small sample of the graph to build the ELP § Chernoff analysis provides a loose upper bound on the number of estimators. § In small graphs, a large number of estimators can get us very close to ground truth.
  99. 114.

    Building Estimators vs Error Profile 0 5 10 15 20

    25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling
  100. 115.

    Advanced Mining Predicate Matching • Find patterns where vertices are

    of type “electronics” • ASAP allows simple edge and vertex predicates Motif Mining • Some patterns are building blocks for other patterns • ASAP caches state of the estimators and reuses them Accuracy Refinement • Users may require more accurate answer later • ASAP can checkpoint and reuse estimators More details in the paper
  101. 116.

    Implementation & Evaluation § Implemented on Apache Spark § Not

    limited to it, only relies on simple dataflow operators § Evaluated in a 16 node cluster § Twitter: 1.47B edges § Friendster: 1.8B edges § UK: 3.73B edges § Comparison using representative patterns: § 3 (2 patterns), 4 (6 patterns) and 5 motifs (21 patterns)
  102. 117.

    Performance on Small Graphs 12.1 162 291.4 3161 7.3 14.9

    18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)
  103. 118.

    Performance on Small Graphs 12.1 162 291.4 3161 7.3 14.9

    18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)
  104. 119.

    Performance on Small Graphs 77 x <5% error 12.1 162

    291.4 3161 7.3 14.9 18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)
  105. 120.

    Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) 3-Motifs (2 patterns) Arabesque ASAP
  106. 121.

    Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 3-Motifs (2 patterns) Arabesque ASAP
  107. 122.

    Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 258 x <5% error 3-Motifs (2 patterns) Twitter Friendster UK Arabesque ASAP
  108. 123.

    Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 258 x <5% error 3-Motifs (2 patterns) Twitter Friendster UK Arabesque ASAP
  109. 124.

    Large Graphs & Complex Patterns 4-Motifs 22 47 0 10

    20 30 40 50 Twitter UK Time (min)
  110. 125.

    Large Graphs & Complex Patterns 12.3 22.1 5.6 14.2 0

    5 10 15 20 25 Twitter UK Time (min) 5% 10% 5-House 4-Motifs 22 47 0 10 20 30 40 50 Twitter UK Time (min)
  111. 126.

    Summary § Pattern mining important & challenging problem § Applications

    in many domains § ASAP uses approximation for fast pattern mining § Leverages graph mining theory & makes it practical § Simple API for developers § ASAP outperforms existing solutions § Can handle much larger graphs with fewer resources http://www.cs.berkeley.edu/~api api@cs.berkeley.edu