Slide 1

Slide 1 text

ASAP: Fast, Approximate Graph Pattern Mining at Scale Anand Iyer ⋆, Zaoxing Liu ⬩, Xin Jin⬩, Shivaram Venkataraman✢, Vladimir Braverman⬩, Ion Stoica ⋆ ⋆UC Berkeley ⬩Johns Hopkins University ✢University of Wisconsin & Microsoft OSDI, October 10, 2018

Slide 2

Slide 2 text

Graphs popular in big data analytics Social networks

Slide 3

Slide 3 text

Graphs popular in big data analytics Metabolic network of a single cell organism Social networks

Slide 4

Slide 4 text

Graphs popular in big data analytics Metabolic network of a single cell organism Social networks Tuberculosis

Slide 5

Slide 5 text

Graphs popular in big data analytics *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Also popular in traditional enterprises*

Slide 6

Slide 6 text

Graphs popular in big data analytics Products and customers *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Also popular in traditional enterprises* P P P P P

Slide 7

Slide 7 text

Graphs popular in big data analytics Products and customers *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* P P P P P D D D D D W W D

Slide 8

Slide 8 text

Graphs popular in big data analytics Products and customers Which (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* P P P P P D D D D D W W D

Slide 9

Slide 9 text

Graphs popular in big data analytics Products and customers Which (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P D D D D D W W D

Slide 10

Slide 10 text

Graphs popular in big data analytics Products and customers Which (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P P P D D D D D W W D

Slide 11

Slide 11 text

Graphs popular in big data analytics Products and customers Which (classes of) products are frequently bought together? *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper) Transactions and involved entities Also popular in traditional enterprises* Small deposits followed by large withdrawal P P P P P P P D D D D D W W D

Slide 12

Slide 12 text

Graph Pattern Mining Discover structural patterns in the underlying graph

Slide 13

Slide 13 text

Graph Pattern Mining Motifs Cliques Discover structural patterns in the underlying graph Frequent Subgraphs

Slide 14

Slide 14 text

Graph Pattern Mining Motifs Cliques Discover structural patterns in the underlying graph Frequent Subgraphs Standard approach: Iterative expansion

Slide 15

Slide 15 text

Graph Pattern Mining Motifs Cliques Discover structural patterns in the underlying graph Frequent Subgraphs 0 1 2 3 Standard approach: Iterative expansion

Slide 16

Slide 16 text

Graph Pattern Mining Motifs Cliques Discover structural patterns in the underlying graph Frequent Subgraphs 0 1 2 3 0 1 2 3 Standard approach: Iterative expansion

Slide 17

Slide 17 text

Graph Pattern Mining Motifs Cliques Discover structural patterns in the underlying graph Frequent Subgraphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion

Slide 18

Slide 18 text

Graph Pattern Mining Motifs Cliques Discover structural patterns in the underlying graph Frequent Subgraphs Huge intermediate data Quickly intractable in large graphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion

Slide 19

Slide 19 text

Graph Pattern Mining Motifs Cliques Discover structural patterns in the underlying graph Frequent Subgraphs Huge intermediate data Quickly intractable in large graphs 0 1 2 3 0 1 2 3 0 1 0 2 0 3 1 2 1 0 2 3 2 0 2 1 3 1 3 1 3 2 1 3 Standard approach: Iterative expansion Challenging to mine patterns in large graphs

Slide 20

Slide 20 text

Graph Pattern Mining Log scale # Edges Computation Time

Slide 21

Slide 21 text

Graph Pattern Mining Log scale # Edges Computation Time Arabesque (SOSP ‘15)

Slide 22

Slide 22 text

Graph Pattern Mining ~1 billion 11 hours Motifs with size = 3 Log scale # Edges Computation Time Arabesque (SOSP ‘15)

Slide 23

Slide 23 text

Graph Pattern Mining 150 s 1.5 billion ~1 billion 11 hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15)

Slide 24

Slide 24 text

Graph Pattern Mining 150 s 1.5 billion ~1 billion 11 hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster

Slide 25

Slide 25 text

Graph Pattern Mining 150 s 1.5 billion ~1 billion 11 hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster 5x less CPU & Memory

Slide 26

Slide 26 text

Graph Pattern Mining 150 s 1.5 billion ~1 billion 11 hours Motifs with size = 3 This work: Log scale # Edges Computation Time Arabesque (SOSP ‘15) 258x faster 5x less CPU & Memory <5% error

Slide 27

Slide 27 text

Many mining tasks do not need exact answers

Slide 28

Slide 28 text

Leverage approximation for pattern mining Many mining tasks do not need exact answers

Slide 29

Slide 29 text

General approach: Apply algorithm on subset(s) (sample) of the input data Approximate Analytics

Slide 30

Slide 30 text

General approach: Apply algorithm on subset(s) (sample) of the input data Approximate Analytics 0 1 4 2 3 graph

Slide 31

Slide 31 text

General approach: Apply algorithm on subset(s) (sample) of the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph 0 1 4 2 3

Slide 32

Slide 32 text

General approach: Apply algorithm on subset(s) (sample) of the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting

Slide 33

Slide 33 text

General approach: Apply algorithm on subset(s) (sample) of the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2

Slide 34

Slide 34 text

Answer: 10 General approach: Apply algorithm on subset(s) (sample) of the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2

Slide 35

Slide 35 text

Answer: 10 General approach: Apply algorithm on subset(s) (sample) of the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2 0 20 40 60 80 100 0 10 20 30 40 50 60 70 80 90 0 2 4 6 8 10 12 Error (%) Speedup Edges Dropped (%) Error Speedup

Slide 36

Slide 36 text

Answer: 10 General approach: Apply algorithm on subset(s) (sample) of the input data Approximate Analytics 0 1 4 2 3 edge sampling (p=0.5) graph e = 1 0 1 4 2 3 triangle counting result $ % 2 = 2 Applying exact algorithm on sampled graph(s) not the right approach for pattern mining

Slide 37

Slide 37 text

ASAP leverages existing work in graph approximation theory and makes it practical

Slide 38

Slide 38 text

Graph Pattern Mining Theory Sample instances of the pattern from the graph stream

Slide 39

Slide 39 text

Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 40

Slide 40 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 41

Slide 41 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 42

Slide 42 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 43

Slide 43 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 44

Slide 44 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 45

Slide 45 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 46

Slide 46 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 47

Slide 47 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 48

Slide 48 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 49

Slide 49 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 50

Slide 50 text

E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 51

Slide 51 text

! = 1 10 ∗ 1 4 E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) Sample instances of the pattern from the graph stream

Slide 52

Slide 52 text

! = 1 10 ∗ 1 4 E0 Graph Pattern Mining Theory 0 1 4 2 3 graph edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 Sample instances of the pattern from the graph stream

Slide 53

Slide 53 text

! = 1 10 ∗ 1 4 E0 Graph Pattern Mining Theory 0 1 4 2 3 graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 Sample instances of the pattern from the graph stream

Slide 54

Slide 54 text

! = 1 10 ∗ 1 4 E0 Graph Pattern Mining Theory 0 1 4 2 3 graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 result 1 ) * +,( -./ '+ = 10 '/ = 0 '0 = 0 '1 = 0 Sample instances of the pattern from the graph stream

Slide 55

Slide 55 text

! = 1 10 ∗ 1 4 E0 Graph Pattern Mining Theory 0 1 4 2 3 estimator (r=4) neighborhood sampling graph E1 E2 E3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) '( = 40 result 1 ) * +,( -./ '+ = 10 '/ = 0 '0 = 0 '1 = 0 Pavan et al. Counting and sampling triangles from a graph stream, VLDB 2013 Sample instances of the pattern from the graph stream

Slide 56

Slide 56 text

A Swift Approximate Pattern miner

Slide 57

Slide 57 text

graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 A Swift Approximate Pattern miner

Slide 58

Slide 58 text

… Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner

Slide 59

Slide 59 text

… Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner

Slide 60

Slide 60 text

… Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner

Slide 61

Slide 61 text

… Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner

Slide 62

Slide 62 text

… Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner

Slide 63

Slide 63 text

… Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 7 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner

Slide 64

Slide 64 text

Graph updates … … Graphs stored on disk or main memory graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) 1 Estimator Count Selection 3 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 4 Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} 6 count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 7 Error-Latency Profile (ELP) 5 Apache Spark Generalized Approximate Pattern Mining 2 A Swift Approximate Pattern miner

Slide 65

Slide 65 text

Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2

Slide 66

Slide 66 text

Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2 Contributions: • Extends neighborhood sampling to general patterns • Provides a unified API • Applies approximate pattern mining in distributed settings

Slide 67

Slide 67 text

Generalized Approximate Pattern Mining Under submission. Please do not distribute. API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API

Slide 68

Slide 68 text

Generalized Approximate Pattern Mining Under submission. Please do not distribute. API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API

Slide 69

Slide 69 text

Generalized Approximate Pattern Mining Under submission. Please do not distribute. API Description sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph. SampleEdge: ()!(e, p) Uniformly sample one edge from the graph. ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled subgraph. ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given subgraph and comes after the subgraph in the order. ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. SampleThreeNodeChain SampleTriangle Developers write a single estimator using ASAP’s API

Slide 70

Slide 70 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)

Slide 71

Slide 71 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)

Slide 72

Slide 72 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)

Slide 73

Slide 73 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2)

Slide 74

Slide 74 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3

Slide 75

Slide 75 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3

Slide 76

Slide 76 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3

Slide 77

Slide 77 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 78

Slide 78 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 79

Slide 79 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 80

Slide 80 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 81

Slide 81 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 82

Slide 82 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 83

Slide 83 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 84

Slide 84 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 85

Slide 85 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 86

Slide 86 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 87

Slide 87 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 88

Slide 88 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 89

Slide 89 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4

Slide 90

Slide 90 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 !" = 40

Slide 91

Slide 91 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges !" = 40

Slide 92

Slide 92 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges ASAP computes the right expectations, runs many instances of the estimator and aggregates results !" = 40

Slide 93

Slide 93 text

Using ASAP’s API 0 1 4 2 3 edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4) ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give subgraph and comes after the subgraph in the order. ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap that appears later in the order can be formed. Table 1: ASAP’s Approximate Pattern Mining API. leThreeNodeChain p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) rn 0 rn 1/(p1.p2) SampleTriangle 1 (e1, p1) = sampleEdge() 2 (e2, p2) = conditionalSampleEdge(Subgraph(e1)) 3 if (!e2) return 0 4 subgraph1 = Subgraph(e1, e2) 5 subgraph2 = Triangle(e1, e2)-subgraph1 6 if conditionalClose(subgraph1, subgraph2) 7 return 1/(p1.p2) 8 else return 0 leFourCliqueType1 p1) = SampleEdge() p2) = ConditionalSampleEdge(Subgraph(e1)) e2) return 0 p3) = ConditionalSampleEdge(Subgraph(e1, e2)) e3) return 0 SampleFourCliqueType2 1 (e1, p1) = SampleEdge() 2 (e2, p2) = SampleEdge() 3 if (isAdjacent(e1, e2) == true) 4 return 0 5 subgraph1 = Subgraph(e1, e2) 0 3 4 See paper for more examples & proof Sampling phase fixes the vertices for a particular instance of a pattern and closing phase waits for remaining edges ASAP computes the right expectations, runs many instances of the estimator and aggregates results !" = 40

Slide 94

Slide 94 text

Applying to Distributed Settings graph

Slide 95

Slide 95 text

Applying to Distributed Settings graph subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)

Slide 96

Slide 96 text

Applying to Distributed Settings graph map: w(=3) workers subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)

Slide 97

Slide 97 text

Applying to Distributed Settings graph map: w(=3) workers subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)

Slide 98

Slide 98 text

Applying to Distributed Settings graph ! "#$ %&' (" map: w(=3) workers subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)

Slide 99

Slide 99 text

Applying to Distributed Settings graph ! "#$ %&' (" map: w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators)

Slide 100

Slide 100 text

Applying to Distributed Settings graph ! "#$ %&' (" map: w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) Random Vertex-cut Partitioning

Slide 101

Slide 101 text

Applying to Distributed Settings graph ! "#$ %&' (" map: w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) )(+) Random Vertex-cut Partitioning

Slide 102

Slide 102 text

Applying to Distributed Settings graph ! "#$ %&' (" map: w(=3) workers reduce subgraph 0 partial count c0 (using r estimators) subgraph 1 partial count c1 (using r estimators) subgraph 2 partial count c2 (using r estimators) Upper bounds on f(w) can be proved using Hajnal-Szemerédi theorem )(+) Random Vertex-cut Partitioning

Slide 103

Slide 103 text

Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2

Slide 104

Slide 104 text

Error-Latency Profile (ELP) Apache Spark Generalized Approximate Pattern Mining graphA.patterns(“a->b->c”, “100s”) graphB.fourClique(“5.0%”,“95.0%”) Estimator Count Selection … Graphs stored on disk or main memory Estimates:{error: <5%, time: 95s} Estimates:{error: <5%, time: 60s} … Graph updates 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) No. of Estimators Twitter Graph Profiling 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling count: 21453 +/- 14 confidence: 95%, time: 92s Embeddings (optional) 1 3 4 6 7 5 2 Contribution: • Novel way to build ELP very fast without the need to know the ground truth or running mining on the full graph.

Slide 105

Slide 105 text

Building Error-Latency Profile Given a time / error bound, how many estimators should ASAP use?

Slide 106

Slide 106 text

Building Error-Latency Profile Given a time / error bound, how many estimators should ASAP use? Number of estimators Time Time vs Estimators

Slide 107

Slide 107 text

Building Error-Latency Profile Given a time / error bound, how many estimators should ASAP use? Number of estimators Time Time vs Estimators Error Number of estimators Error vs Estimators

Slide 108

Slide 108 text

Building Estimators vs Time Profile Time complexity linear in number of estimators

Slide 109

Slide 109 text

Building Estimators vs Time Profile 1 2 3 0.5M 1M 1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators

Slide 110

Slide 110 text

Building Estimators vs Time Profile 1 2 3 0.5M 1M 1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators ASAP sets a profiling cost and picks maximum points within the budget

Slide 111

Slide 111 text

Building Estimators vs Time Profile 1 2 3 0.5M 1M 1.5M 2M Runtime (min) No. of Estimators Twitter Graph Time complexity linear in number of estimators 1 2 3 0 0.5M 1M 1.5M 2.1M Runtime (min) Twitter Graph Profiling

Slide 112

Slide 112 text

Building Estimators vs Error Profile 0 5 10 15 20 25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators

Slide 113

Slide 113 text

Building Estimators vs Error Profile 0 5 10 15 20 25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators Key idea: Use a very small sample of the graph to build the ELP § Chernoff analysis provides a loose upper bound on the number of estimators. § In small graphs, a large number of estimators can get us very close to ground truth.

Slide 114

Slide 114 text

Building Estimators vs Error Profile 0 5 10 15 20 25 30 35 40 50k 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Error complexity non-linear in number of estimators 0 5 10 15 20 25 30 35 40 0 0.5m 1m 1.5m 2.1m Error Rate (%) No. of Estimators Twitter Graph Profiling

Slide 115

Slide 115 text

Advanced Mining Predicate Matching • Find patterns where vertices are of type “electronics” • ASAP allows simple edge and vertex predicates Motif Mining • Some patterns are building blocks for other patterns • ASAP caches state of the estimators and reuses them Accuracy Refinement • Users may require more accurate answer later • ASAP can checkpoint and reuse estimators More details in the paper

Slide 116

Slide 116 text

Implementation & Evaluation § Implemented on Apache Spark § Not limited to it, only relies on simple dataflow operators § Evaluated in a 16 node cluster § Twitter: 1.47B edges § Friendster: 1.8B edges § UK: 3.73B edges § Comparison using representative patterns: § 3 (2 patterns), 4 (6 patterns) and 5 motifs (21 patterns)

Slide 117

Slide 117 text

Performance on Small Graphs 12.1 162 291.4 3161 7.3 14.9 18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)

Slide 118

Slide 118 text

Performance on Small Graphs 12.1 162 291.4 3161 7.3 14.9 18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)

Slide 119

Slide 119 text

Performance on Small Graphs 77 x <5% error 12.1 162 291.4 3161 7.3 14.9 18.1 41.6 1 10 100 1000 10000 CiteSeer Mico Youtube LiveJournal Time (s) Arabesque ASAP 4-Motifs (6 patterns)

Slide 120

Slide 120 text

Large Graphs & Simple Patterns 645 2.5 5 5.9 1 10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) 3-Motifs (2 patterns) Arabesque ASAP

Slide 121

Slide 121 text

Large Graphs & Simple Patterns 645 2.5 5 5.9 1 10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 3-Motifs (2 patterns) Arabesque ASAP

Slide 122

Slide 122 text

Large Graphs & Simple Patterns 645 2.5 5 5.9 1 10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 258 x <5% error 3-Motifs (2 patterns) Twitter Friendster UK Arabesque ASAP

Slide 123

Slide 123 text

Large Graphs & Simple Patterns 645 2.5 5 5.9 1 10 100 1000 0.9 1.5 1.8 3.7 Time (min) # Edges (Billions) Proprietary graph, 20 machines (256GB each) 258 x <5% error 3-Motifs (2 patterns) Twitter Friendster UK Arabesque ASAP

Slide 124

Slide 124 text

Large Graphs & Complex Patterns 4-Motifs 22 47 0 10 20 30 40 50 Twitter UK Time (min)

Slide 125

Slide 125 text

Large Graphs & Complex Patterns 12.3 22.1 5.6 14.2 0 5 10 15 20 25 Twitter UK Time (min) 5% 10% 5-House 4-Motifs 22 47 0 10 20 30 40 50 Twitter UK Time (min)

Slide 126

Slide 126 text

Summary § Pattern mining important & challenging problem § Applications in many domains § ASAP uses approximation for fast pattern mining § Leverages graph mining theory & makes it practical § Simple API for developers § ASAP outperforms existing solutions § Can handle much larger graphs with fewer resources http://www.cs.berkeley.edu/~api [email protected]