ASAP: Fast, Approximate Graph Pattern Mining at Scale

0ff46442256bf55681d64027c68beea7?s=47 Anand Iyer
October 10, 2018

ASAP: Fast, Approximate Graph Pattern Mining at Scale

0ff46442256bf55681d64027c68beea7?s=128

Anand Iyer

October 10, 2018
Tweet

Transcript

  1. ASAP: Fast, Approximate Graph Pattern Mining at Scale Anand Iyer

    ⋆, Zaoxing Liu ⬩, Xin Jin⬩, Shivaram Venkataraman✢, Vladimir Braverman⬩, Ion Stoica ⋆ ⋆UC Berkeley ⬩Johns Hopkins University ✢University of Wisconsin & Microsoft OSDI, October 10, 2018
  2. Graphs popular in big data analytics Social networks

  3. Graphs popular in big data analytics Metabolic network of a

    single cell organism Social networks
  4. Graphs popular in big data analytics Metabolic network of a

    single cell organism Social networks Tuberculosis
  5. Graphs popular in big data analytics *“The Ubiquity of Large

    Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Also popular in traditional enterprises*
  6. Graphs popular in big data analytics Products and customers *“The

    Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Also popular in traditional enterprises* P P P P P
  7. Graphs popular in big data analytics Products and customers *“The

    Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* P P P P P D D D D D W W D
  8. Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* P P P P P D D D D D W W D
  9. Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P D D D D D W W D
  10. Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P P P D D D D D W W D
  11. Graphs popular in big data analytics Products and customers Which

    (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P P P D D D D D W W D
  12. Graph Pattern Mining Discover structural patterns in the underlying graph

  13. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs
  14. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs Standard approach: Iterative expansion
  15. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs 0 1 2 3 Standard approach: Iterative expansion
  16. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs 0 1 2 3 0 1 2 3 Standard approach: Iterative expansion
  17. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion
  18. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs Huge intermediate data Quickly intractable in large graphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion
  19. Graph Pattern Mining Motifs Cliques Discover structural patterns in the

    underlying graph Frequent Subgraphs Huge intermediate data Quickly intractable in large graphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion Challenging to mine patterns in large graphs
  20. Graph Pattern Mining Log scale # Edges Computation Time

  21. Graph Pattern Mining Log scale # Edges Computation Time Arabesque

    (SOSP ‘15)
  22. Graph Pattern Mining ~1 billion 11 hours Motifs with size

    = 3 Log scale # Edges Computation Time Arabesque (SOSP ‘15)
  23. Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15)
  24. Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster
  25. Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster 5x less CPU & Memory
  26. Graph Pattern Mining 150 s 1.5 billion ~1 billion 11

    hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster 5x less CPU & Memory <5% error
  27. Many mining tasks do not need exact answers

  28. Leverage approximation for pattern mining Many mining tasks do not

    need exact answers
  29. General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics
  30. General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics 0 1 4 2 3 graph
  31. General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph 0 1 4 2 3
  32. General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting
  33. General approach: Apply algorithm on subset(s) (sample) of the input

    data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2
  34. Answer: 10 General approach: Apply algorithm on subset(s) (sample) of

    the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2
  35. Answer: 10 General approach: Apply algorithm on subset(s) (sample) of

    the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 0 2 4 6 8 10 12 Error (%) Speedup Edges Dropped (%) Error Speedup
  36. Answer: 10 General approach: Apply algorithm on subset(s) (sample) of

    the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2 Applying exact algorithm on sampled graph(s) not the right approach for pattern mining
  37. ASAP leverages existing work in graph approximation theory and makes

    it practical
  38. Graph Pattern Mining Theory Sample instances of the pattern from

    the graph stream
  39. Graph Pattern Mining Theory 0 1 4 2 3 graph

    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  40. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  41. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  42. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  43. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  44. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  45. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  46. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  47. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  48. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  49. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  50. E0 Graph Pattern Mining Theory 0 1 4 2 3

    graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  51. ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream
  52. ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 Sample instances of the pattern from the graph stream
  53. ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 Sample instances of the pattern from the graph stream
  54. ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 result 1 ) * +,( -./ '+ = 10 '/ = 0 '0 = 0 '1 = 0 Sample instances of the pattern from the graph stream
  55. ! = 1 10 ∗ 1 4 E0 Graph Pattern

    Mining Theory 0 1 4 2 3 estimator (r=4) neighborhood sampling graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 result 1 ) * +,( -./ '+ = 10 '/ = 0 '0 = 0 '1 = 0 Pavan et al. Counting and sampling triangles from a graph stream, VLDB 2013 Sample instances of the pattern from the graph stream
  56. A Swift Approximate Pattern miner

  57. graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 A Swift Approximate Pattern miner

  58. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  59. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  60. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  61. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  62. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  63. … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”)

    graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 7 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  64. Graph updates … … Graphs stored on disk or main

    memory graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 7 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner
  65. Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2
  66. Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2 Contributions: • Extends neighborhood sampling to general patterns • Provides a unified API • Applies approximate pattern mining in distributed settings
  67. Generalized Approximate Pattern Mining Under submission. Please do not distribute.

    API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
  68. Generalized Approximate Pattern Mining Under submission. Please do not distribute.

    API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
  69. Generalized Approximate Pattern Mining Under submission. Please do not distribute.

    API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API
  70. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  71. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  72. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  73. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)
  74. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3
  75. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3
  76. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3
  77. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  78. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  79. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  80. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  81. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  82. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  83. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  84. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  85. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  86. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  87. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  88. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  89. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4
  90. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 !" = 40
  91. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges !" = 40
  92. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges ASAP computes the right expectations, runs many instances of the estimator and aggregates results !" = 40
  93. Using ASAP’s API 0 1 4 2 3 edge stream:

    (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 See paper for more examples & proof Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges ASAP computes the right expectations, runs many instances of the estimator and aggregates results !" = 40
  94. Applying to Distributed Settings graph

  95. Applying to Distributed Settings graph subgraph 0 partial count c0

    (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  96. Applying to Distributed Settings graph map: w(=3) workers subgraph 0

    partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  97. Applying to Distributed Settings graph map: w(=3) workers subgraph 0

    partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  98. Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  99. Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)
  100. Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) Random Vertex-cut Partitioning
  101. Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) )(+) Random Vertex-cut Partitioning
  102. Applying to Distributed Settings graph ! "#$ %&' (" map:

    w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) Upper bounds on f(w) can be proved using Hajnal-Szemerédi theorem )(+) Random Vertex-cut Partitioning
  103. Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2
  104. Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”,

    “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2 Contribution: • Novel way to build ELP very fast without the need to know the ground truth or running mining on the full graph.
  105. Building Error-Latency Profile Given a time / error bound, how

    many estimators should ASAP use?
  106. Building Error-Latency Profile Given a time / error bound, how

    many estimators should ASAP use? Number of estimators Time Time vs Estimators
  107. Building Error-Latency Profile Given a time / error bound, how

    many estimators should ASAP use? Number of estimators Time Time vs Estimators Error Number of estimators Error vs Estimators
  108. Building Estimators vs Time Profile Time complexity linear in number

    of estimators
  109. Building Estimators vs Time Profile 1 2 3 0.5M 1M

    1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators
  110. Building Estimators vs Time Profile 1 2 3 0.5M 1M

    1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators ASAP sets a profiling cost and picks maximum points within the budget
  111. Building Estimators vs Time Profile 1 2 3 0.5M 1M

    1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) Twitter Graph Profiling
  112. Building Estimators vs Error Profile 0 5 10 15 20

    25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators
  113. Building Estimators vs Error Profile 0 5 10 15 20

    25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators Key idea: Use a very small sample of the graph to build the ELP § Chernoff analysis provides a loose upper bound on the number of estimators. § In small graphs, a large number of estimators can get us very close to ground truth.
  114. Building Estimators vs Error Profile 0 5 10 15 20

    25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling
  115. Advanced Mining Predicate Matching • Find patterns where vertices are

    of type “electronics” • ASAP allows simple edge and vertex predicates Motif Mining • Some patterns are building blocks for other patterns • ASAP caches state of the estimators and reuses them Accuracy Refinement • Users may require more accurate answer later • ASAP can checkpoint and reuse estimators More details in the paper
  116. Implementation & Evaluation § Implemented on Apache Spark § Not

    limited to it, only relies on simple dataflow operators § Evaluated in a 16 node cluster § Twitter: 1.47B edges § Friendster: 1.8B edges § UK: 3.73B edges § Comparison using representative patterns: § 3 (2 patterns), 4 (6 patterns) and 5 motifs (21 patterns)
  117. Performance on Small Graphs 12.1 162 291.4 3161 7.3 14.9

    18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)
  118. Performance on Small Graphs 12.1 162 291.4 3161 7.3 14.9

    18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)
  119. Performance on Small Graphs 77 x <5% error 12.1 162

    291.4 3161 7.3 14.9 18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)
  120. Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) 3-Motifs (2 patterns) Arabesque ASAP
  121. Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 3-Motifs (2 patterns) Arabesque ASAP
  122. Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 258 x <5% error 3-Motifs (2 patterns) Twitter Friendster UK Arabesque ASAP
  123. Large Graphs & Simple Patterns 645 2.5 5 5.9 1

    10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 258 x <5% error 3-Motifs (2 patterns) Twitter Friendster UK Arabesque ASAP
  124. Large Graphs & Complex Patterns 4-Motifs 22 47 0 10

    20 30 40 50 Twitter UK Time (min)
  125. Large Graphs & Complex Patterns 12.3 22.1 5.6 14.2 0

    5 10 15 20 25 Twitter UK Time (min) 5% 10% 5-House 4-Motifs 22 47 0 10 20 30 40 50 Twitter UK Time (min)
  126. Summary § Pattern mining important & challenging problem § Applications

    in many domains § ASAP uses approximation for fast pattern mining § Leverages graph mining theory & makes it practical § Simple API for developers § ASAP outperforms existing solutions § Can handle much larger graphs with fewer resources http://www.cs.berkeley.edu/~api api@cs.berkeley.edu