Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ASAP: Fast, Approximate Graph Pattern Mining at Scale

Anand Iyer
October 10, 2018

ASAP: Fast, Approximate Graph Pattern Mining at Scale

Anand Iyer

October 10, 2018
Tweet

More Decks by Anand Iyer

Other Decks in Research

Transcript

  1. ASAP: Fast, Approximate Graph
    Pattern Mining at Scale
    Anand Iyer ⋆, Zaoxing Liu ⬩, Xin Jin⬩,
    Shivaram Venkataraman✢, Vladimir Braverman⬩, Ion Stoica ⋆
    ⋆UC Berkeley ⬩Johns Hopkins University ✢University of Wisconsin & Microsoft
    OSDI, October 10, 2018

    View Slide

  2. Graphs popular in big data analytics
    Social networks

    View Slide

  3. Graphs popular in big data analytics
    Metabolic network of a single cell organism
    Social networks

    View Slide

  4. Graphs popular in big data analytics
    Metabolic network of a single cell organism
    Social networks
    Tuberculosis

    View Slide

  5. Graphs popular in big data analytics
    *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper)
    Also popular in traditional enterprises*

    View Slide

  6. Graphs popular in big data analytics
    Products and customers
    *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper)
    Also popular in traditional enterprises*
    P
    P
    P
    P
    P

    View Slide

  7. Graphs popular in big data analytics
    Products and customers
    *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper)
    Transactions and involved entities
    Also popular in traditional enterprises*
    P
    P
    P
    P
    P
    D
    D
    D
    D
    D
    W
    W
    D

    View Slide

  8. Graphs popular in big data analytics
    Products and customers
    Which (classes of) products are
    frequently bought together?
    *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper)
    Transactions and involved entities
    Also popular in traditional enterprises*
    P
    P
    P
    P
    P
    D
    D
    D
    D
    D
    W
    W
    D

    View Slide

  9. Graphs popular in big data analytics
    Products and customers
    Which (classes of) products are
    frequently bought together?
    *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper)
    Transactions and involved entities
    Also popular in traditional enterprises*
    Small deposits followed
    by large withdrawal
    P
    P
    P
    P
    P
    D
    D
    D
    D
    D
    W
    W
    D

    View Slide

  10. Graphs popular in big data analytics
    Products and customers
    Which (classes of) products are
    frequently bought together?
    *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper)
    Transactions and involved entities
    Also popular in traditional enterprises*
    Small deposits followed
    by large withdrawal
    P
    P
    P
    P
    P
    P
    P
    D
    D
    D
    D
    D
    W
    W
    D

    View Slide

  11. Graphs popular in big data analytics
    Products and customers
    Which (classes of) products are
    frequently bought together?
    *“The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing” ,Sahu et. al, VLDB 2018 (best paper)
    Transactions and involved entities
    Also popular in traditional enterprises*
    Small deposits followed
    by large withdrawal
    P
    P
    P
    P
    P
    P
    P
    D
    D
    D
    D
    D
    W
    W
    D

    View Slide

  12. Graph Pattern Mining
    Discover structural patterns in the underlying graph

    View Slide

  13. Graph Pattern Mining
    Motifs
    Cliques
    Discover structural patterns in the underlying graph
    Frequent Subgraphs

    View Slide

  14. Graph Pattern Mining
    Motifs
    Cliques
    Discover structural patterns in the underlying graph
    Frequent Subgraphs
    Standard approach: Iterative expansion

    View Slide

  15. Graph Pattern Mining
    Motifs
    Cliques
    Discover structural patterns in the underlying graph
    Frequent Subgraphs
    0
    1
    2 3
    Standard approach: Iterative expansion

    View Slide

  16. Graph Pattern Mining
    Motifs
    Cliques
    Discover structural patterns in the underlying graph
    Frequent Subgraphs
    0
    1
    2 3
    0
    1
    2
    3
    Standard approach: Iterative expansion

    View Slide

  17. Graph Pattern Mining
    Motifs
    Cliques
    Discover structural patterns in the underlying graph
    Frequent Subgraphs
    0
    1
    2 3
    0
    1
    2
    3
    0 1 0 2 0 3
    1 2 1 0
    2 3 2 0 2 1
    3 1 3 1 3 2
    1 3
    Standard approach: Iterative expansion

    View Slide

  18. Graph Pattern Mining
    Motifs
    Cliques
    Discover structural patterns in the underlying graph
    Frequent Subgraphs
    Huge intermediate data
    Quickly intractable in large graphs
    0
    1
    2 3
    0
    1
    2
    3
    0 1 0 2 0 3
    1 2 1 0
    2 3 2 0 2 1
    3 1 3 1 3 2
    1 3
    Standard approach: Iterative expansion

    View Slide

  19. Graph Pattern Mining
    Motifs
    Cliques
    Discover structural patterns in the underlying graph
    Frequent Subgraphs
    Huge intermediate data
    Quickly intractable in large graphs
    0
    1
    2 3
    0
    1
    2
    3
    0 1 0 2 0 3
    1 2 1 0
    2 3 2 0 2 1
    3 1 3 1 3 2
    1 3
    Standard approach: Iterative expansion
    Challenging to mine patterns in large graphs

    View Slide

  20. Graph Pattern Mining
    Log scale
    # Edges
    Computation Time

    View Slide

  21. Graph Pattern Mining
    Log scale
    # Edges
    Computation Time
    Arabesque
    (SOSP ‘15)

    View Slide

  22. Graph Pattern Mining
    ~1 billion
    11 hours
    Motifs with size = 3
    Log scale
    # Edges
    Computation Time
    Arabesque
    (SOSP ‘15)

    View Slide

  23. Graph Pattern Mining
    150 s
    1.5 billion
    ~1 billion
    11 hours
    Motifs with size = 3
    This work:
    Log scale
    # Edges
    Computation Time
    Arabesque
    (SOSP ‘15)

    View Slide

  24. Graph Pattern Mining
    150 s
    1.5 billion
    ~1 billion
    11 hours
    Motifs with size = 3
    This work:
    Log scale
    # Edges
    Computation Time
    Arabesque
    (SOSP ‘15)
    258x faster

    View Slide

  25. Graph Pattern Mining
    150 s
    1.5 billion
    ~1 billion
    11 hours
    Motifs with size = 3
    This work:
    Log scale
    # Edges
    Computation Time
    Arabesque
    (SOSP ‘15)
    258x faster
    5x less CPU
    & Memory

    View Slide

  26. Graph Pattern Mining
    150 s
    1.5 billion
    ~1 billion
    11 hours
    Motifs with size = 3
    This work:
    Log scale
    # Edges
    Computation Time
    Arabesque
    (SOSP ‘15)
    258x faster
    5x less CPU
    & Memory
    <5% error

    View Slide

  27. Many mining tasks do not need exact answers

    View Slide

  28. Leverage approximation for pattern mining
    Many mining tasks do not need exact answers

    View Slide

  29. General approach: Apply algorithm on
    subset(s) (sample) of the input data
    Approximate Analytics

    View Slide

  30. General approach: Apply algorithm on
    subset(s) (sample) of the input data
    Approximate Analytics
    0
    1 4
    2 3
    graph

    View Slide

  31. General approach: Apply algorithm on
    subset(s) (sample) of the input data
    Approximate Analytics
    0
    1 4
    2 3
    edge sampling
    (p=0.5)
    graph
    0
    1 4
    2 3

    View Slide

  32. General approach: Apply algorithm on
    subset(s) (sample) of the input data
    Approximate Analytics
    0
    1 4
    2 3
    edge sampling
    (p=0.5)
    graph
    e = 1
    0
    1 4
    2 3
    triangle
    counting

    View Slide

  33. General approach: Apply algorithm on
    subset(s) (sample) of the input data
    Approximate Analytics
    0
    1 4
    2 3
    edge sampling
    (p=0.5)
    graph
    e = 1
    0
    1 4
    2 3
    triangle
    counting
    result
    $ % 2 = 2

    View Slide

  34. Answer: 10
    General approach: Apply algorithm on
    subset(s) (sample) of the input data
    Approximate Analytics
    0
    1 4
    2 3
    edge sampling
    (p=0.5)
    graph
    e = 1
    0
    1 4
    2 3
    triangle
    counting
    result
    $ % 2 = 2

    View Slide

  35. Answer: 10
    General approach: Apply algorithm on
    subset(s) (sample) of the input data
    Approximate Analytics
    0
    1 4
    2 3
    edge sampling
    (p=0.5)
    graph
    e = 1
    0
    1 4
    2 3
    triangle
    counting
    result
    $ % 2 = 2
    0
    20
    40
    60
    80
    100
    0 10 20 30 40 50 60 70 80 90
    0
    2
    4
    6
    8
    10
    12
    Error (%)
    Speedup
    Edges Dropped (%)
    Error
    Speedup

    View Slide

  36. Answer: 10
    General approach: Apply algorithm on
    subset(s) (sample) of the input data
    Approximate Analytics
    0
    1 4
    2 3
    edge sampling
    (p=0.5)
    graph
    e = 1
    0
    1 4
    2 3
    triangle
    counting
    result
    $ % 2 = 2
    Applying exact algorithm on sampled graph(s)
    not the right approach for pattern mining

    View Slide

  37. ASAP leverages existing work in graph
    approximation theory and makes it practical

    View Slide

  38. Graph Pattern Mining Theory
    Sample instances of the pattern from the graph stream

    View Slide

  39. Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  40. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  41. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  42. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  43. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  44. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  45. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  46. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  47. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  48. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  49. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  50. E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  51. ! =
    1
    10

    1
    4
    E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    Sample instances of the pattern from the graph stream

    View Slide

  52. ! =
    1
    10

    1
    4
    E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    '(
    = 40
    Sample instances of the pattern from the graph stream

    View Slide

  53. ! =
    1
    10

    1
    4
    E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    E1
    E2
    E3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    '(
    = 40
    Sample instances of the pattern from the graph stream

    View Slide

  54. ! =
    1
    10

    1
    4
    E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    graph
    E1
    E2
    E3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    '(
    = 40
    result
    1
    )
    *
    +,(
    -./
    '+
    = 10
    '/
    = 0
    '0
    = 0
    '1
    = 0
    Sample instances of the pattern from the graph stream

    View Slide

  55. ! =
    1
    10

    1
    4
    E0
    Graph Pattern Mining Theory
    0
    1 4
    2 3
    estimator
    (r=4)
    neighborhood
    sampling
    graph
    E1
    E2
    E3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    '(
    = 40
    result
    1
    )
    *
    +,(
    -./
    '+
    = 10
    '/
    = 0
    '0
    = 0
    '1
    = 0
    Pavan et al. Counting and sampling triangles from a graph stream, VLDB 2013
    Sample instances of the pattern from the graph stream

    View Slide

  56. A Swift Approximate Pattern miner

    View Slide

  57. graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    1
    A Swift Approximate Pattern miner

    View Slide


  58. Graphs stored on disk
    or main memory
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    1
    Apache Spark
    Generalized Approximate
    Pattern Mining
    2
    A Swift Approximate Pattern miner

    View Slide


  59. Graphs stored on disk
    or main memory
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    1
    Estimator Count Selection
    3
    Apache Spark
    Generalized Approximate
    Pattern Mining
    2
    A Swift Approximate Pattern miner

    View Slide


  60. Graphs stored on disk
    or main memory
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    1
    Estimator Count Selection
    3
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    4
    Apache Spark
    Generalized Approximate
    Pattern Mining
    2
    A Swift Approximate Pattern miner

    View Slide


  61. Graphs stored on disk
    or main memory
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    1
    Estimator Count Selection
    3
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    4
    Error-Latency Profile
    (ELP)
    5
    Apache Spark
    Generalized Approximate
    Pattern Mining
    2
    A Swift Approximate Pattern miner

    View Slide


  62. Graphs stored on disk
    or main memory
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    1
    Estimator Count Selection
    3
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    4
    Estimates:{error: <5%, time: 95s}
    Estimates:{error: <5%, time: 60s}
    6
    Error-Latency Profile
    (ELP)
    5
    Apache Spark
    Generalized Approximate
    Pattern Mining
    2
    A Swift Approximate Pattern miner

    View Slide


  63. Graphs stored on disk
    or main memory
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    1
    Estimator Count Selection
    3
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    4
    Estimates:{error: <5%, time: 95s}
    Estimates:{error: <5%, time: 60s}
    6
    count: 21453 +/- 14
    confidence: 95%,
    time: 92s
    Embeddings (optional)
    7
    Error-Latency Profile
    (ELP)
    5
    Apache Spark
    Generalized Approximate
    Pattern Mining
    2
    A Swift Approximate Pattern miner

    View Slide

  64. Graph updates


    Graphs stored on disk
    or main memory
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    1
    Estimator Count Selection
    3
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    4
    Estimates:{error: <5%, time: 95s}
    Estimates:{error: <5%, time: 60s}
    6
    count: 21453 +/- 14
    confidence: 95%,
    time: 92s
    Embeddings (optional)
    7
    Error-Latency Profile
    (ELP)
    5
    Apache Spark
    Generalized Approximate
    Pattern Mining
    2
    A Swift Approximate Pattern miner

    View Slide

  65. Error-Latency Profile
    (ELP)
    Apache Spark
    Generalized Approximate
    Pattern Mining
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    Estimator Count Selection

    Graphs stored on disk
    or main memory
    Estimates:{error: <5%, time: 95s}
    Estimates:{error: <5%, time: 60s}

    Graph updates
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    count: 21453 +/- 14
    confidence: 95%,
    time: 92s
    Embeddings (optional)
    1
    3
    4
    6 7
    5
    2

    View Slide

  66. Error-Latency Profile
    (ELP)
    Apache Spark
    Generalized Approximate
    Pattern Mining
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    Estimator Count Selection

    Graphs stored on disk
    or main memory
    Estimates:{error: <5%, time: 95s}
    Estimates:{error: <5%, time: 60s}

    Graph updates
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    count: 21453 +/- 14
    confidence: 95%,
    time: 92s
    Embeddings (optional)
    1
    3
    4
    6 7
    5
    2
    Contributions:
    • Extends neighborhood sampling to general patterns
    • Provides a unified API
    • Applies approximate pattern mining in distributed settings

    View Slide

  67. Generalized Approximate Pattern Mining
    Under submission. Please do not distribute.
    API Description
    sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph.
    SampleEdge: ()!(e, p) Uniformly sample one edge from the graph.
    ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled
    subgraph.
    ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given
    subgraph and comes after the subgraph in the order.
    ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    SampleThreeNodeChain SampleTriangle
    Developers write a single estimator using ASAP’s API

    View Slide

  68. Generalized Approximate Pattern Mining
    Under submission. Please do not distribute.
    API Description
    sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph.
    SampleEdge: ()!(e, p) Uniformly sample one edge from the graph.
    ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled
    subgraph.
    ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given
    subgraph and comes after the subgraph in the order.
    ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    SampleThreeNodeChain SampleTriangle
    Developers write a single estimator using ASAP’s API

    View Slide

  69. Generalized Approximate Pattern Mining
    Under submission. Please do not distribute.
    API Description
    sampleVertex: ()!(v, p) Uniformly sample one vertex from the graph.
    SampleEdge: ()!(e, p) Uniformly sample one edge from the graph.
    ConditionalSampleVertex: (subgraph)!(v, p) Uniformly sample a vertex that appears after a sampled
    subgraph.
    ConditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the given
    subgraph and comes after the subgraph in the order.
    ConditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgraph
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    SampleThreeNodeChain SampleTriangle
    Developers write a single estimator using ASAP’s API

    View Slide

  70. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)

    View Slide

  71. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)

    View Slide

  72. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)

    View Slide

  73. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)

    View Slide

  74. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3

    View Slide

  75. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3

    View Slide

  76. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3

    View Slide

  77. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  78. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  79. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  80. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  81. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  82. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  83. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  84. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  85. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  86. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  87. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  88. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  89. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4

    View Slide

  90. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4
    !"
    = 40

    View Slide

  91. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4
    Sampling phase fixes the vertices for a particular instance of a pattern
    and closing phase waits for remaining edges
    !"
    = 40

    View Slide

  92. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4
    Sampling phase fixes the vertices for a particular instance of a pattern
    and closing phase waits for remaining edges
    ASAP computes the right expectations, runs many
    instances of the estimator and aggregates results
    !"
    = 40

    View Slide

  93. Using ASAP’s API
    0
    1 4
    2 3
    edge stream: (0,1), (0,2), (0,3), (0,4), (1,2), (1,3), (1,4), (2,3), (2,4), (3,4)
    ditionalSampleEdge: (subgraph)!(e, p) Uniformly sample an edge that is adjacent to the give
    subgraph and comes after the subgraph in the order.
    ditionalClose: (subgraph, subgraph)!boolean Given a sampled subgraph, check if another subgrap
    that appears later in the order can be formed.
    Table 1: ASAP’s Approximate Pattern Mining API.
    leThreeNodeChain
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2)
    rn 0
    rn 1/(p1.p2)
    SampleTriangle
    1 (e1, p1) = sampleEdge()
    2 (e2, p2) = conditionalSampleEdge(Subgraph(e1))
    3 if (!e2) return 0
    4 subgraph1 = Subgraph(e1, e2)
    5 subgraph2 = Triangle(e1, e2)-subgraph1
    6 if conditionalClose(subgraph1, subgraph2)
    7 return 1/(p1.p2)
    8 else return 0
    leFourCliqueType1
    p1) = SampleEdge()
    p2) = ConditionalSampleEdge(Subgraph(e1))
    e2) return 0
    p3) = ConditionalSampleEdge(Subgraph(e1, e2))
    e3) return 0
    SampleFourCliqueType2
    1 (e1, p1) = SampleEdge()
    2 (e2, p2) = SampleEdge()
    3 if (isAdjacent(e1, e2) == true)
    4 return 0
    5 subgraph1 = Subgraph(e1, e2)
    0
    3
    4
    See paper for more examples & proof
    Sampling phase fixes the vertices for a particular instance of a pattern
    and closing phase waits for remaining edges
    ASAP computes the right expectations, runs many
    instances of the estimator and aggregates results
    !"
    = 40

    View Slide

  94. Applying to Distributed Settings
    graph

    View Slide

  95. Applying to Distributed Settings
    graph
    subgraph
    0
    partial count c0
    (using r estimators)
    subgraph
    1
    partial count c1
    (using r estimators)
    subgraph
    2
    partial count c2
    (using r estimators)

    View Slide

  96. Applying to Distributed Settings
    graph
    map: w(=3) workers
    subgraph
    0
    partial count c0
    (using r estimators)
    subgraph
    1
    partial count c1
    (using r estimators)
    subgraph
    2
    partial count c2
    (using r estimators)

    View Slide

  97. Applying to Distributed Settings
    graph
    map: w(=3) workers
    subgraph
    0
    partial count c0
    (using r estimators)
    subgraph
    1
    partial count c1
    (using r estimators)
    subgraph
    2
    partial count c2
    (using r estimators)

    View Slide

  98. Applying to Distributed Settings
    graph !
    "#$
    %&'
    ("
    map: w(=3) workers
    subgraph
    0
    partial count c0
    (using r estimators)
    subgraph
    1
    partial count c1
    (using r estimators)
    subgraph
    2
    partial count c2
    (using r estimators)

    View Slide

  99. Applying to Distributed Settings
    graph !
    "#$
    %&'
    ("
    map: w(=3) workers reduce
    subgraph
    0
    partial count c0
    (using r estimators)
    subgraph
    1
    partial count c1
    (using r estimators)
    subgraph
    2
    partial count c2
    (using r estimators)

    View Slide

  100. Applying to Distributed Settings
    graph !
    "#$
    %&'
    ("
    map: w(=3) workers reduce
    subgraph
    0
    partial count c0
    (using r estimators)
    subgraph
    1
    partial count c1
    (using r estimators)
    subgraph
    2
    partial count c2
    (using r estimators)
    Random Vertex-cut Partitioning

    View Slide

  101. Applying to Distributed Settings
    graph !
    "#$
    %&'
    ("
    map: w(=3) workers reduce
    subgraph
    0
    partial count c0
    (using r estimators)
    subgraph
    1
    partial count c1
    (using r estimators)
    subgraph
    2
    partial count c2
    (using r estimators)
    )(+)
    Random Vertex-cut Partitioning

    View Slide

  102. Applying to Distributed Settings
    graph !
    "#$
    %&'
    ("
    map: w(=3) workers reduce
    subgraph
    0
    partial count c0
    (using r estimators)
    subgraph
    1
    partial count c1
    (using r estimators)
    subgraph
    2
    partial count c2
    (using r estimators)
    Upper bounds on f(w) can be proved using
    Hajnal-Szemerédi theorem
    )(+)
    Random Vertex-cut Partitioning

    View Slide

  103. Error-Latency Profile
    (ELP)
    Apache Spark
    Generalized Approximate
    Pattern Mining
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    Estimator Count Selection

    Graphs stored on disk
    or main memory
    Estimates:{error: <5%, time: 95s}
    Estimates:{error: <5%, time: 60s}

    Graph updates
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    count: 21453 +/- 14
    confidence: 95%,
    time: 92s
    Embeddings (optional)
    1
    3
    4
    6 7
    5
    2

    View Slide

  104. Error-Latency Profile
    (ELP)
    Apache Spark
    Generalized Approximate
    Pattern Mining
    graphA.patterns(“a->b->c”, “100s”)
    graphB.fourClique(“5.0%”,“95.0%”)
    Estimator Count Selection

    Graphs stored on disk
    or main memory
    Estimates:{error: <5%, time: 95s}
    Estimates:{error: <5%, time: 60s}

    Graph updates
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    No. of Estimators
    Twitter Graph Profiling
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling
    count: 21453 +/- 14
    confidence: 95%,
    time: 92s
    Embeddings (optional)
    1
    3
    4
    6 7
    5
    2
    Contribution:
    • Novel way to build ELP very fast without the need to know
    the ground truth or running mining on the full graph.

    View Slide

  105. Building Error-Latency Profile
    Given a time / error bound, how many estimators
    should ASAP use?

    View Slide

  106. Building Error-Latency Profile
    Given a time / error bound, how many estimators
    should ASAP use?
    Number of estimators
    Time
    Time vs Estimators

    View Slide

  107. Building Error-Latency Profile
    Given a time / error bound, how many estimators
    should ASAP use?
    Number of estimators
    Time
    Time vs Estimators
    Error
    Number of estimators
    Error vs Estimators

    View Slide

  108. Building Estimators vs Time Profile
    Time complexity linear in number of estimators

    View Slide

  109. Building Estimators vs Time Profile
    1
    2
    3
    0.5M 1M 1.5M 2M
    Runtime (min)
    No. of Estimators
    Twitter Graph
    Time complexity linear in number of estimators

    View Slide

  110. Building Estimators vs Time Profile
    1
    2
    3
    0.5M 1M 1.5M 2M
    Runtime (min)
    No. of Estimators
    Twitter Graph
    Time complexity linear in number of estimators
    ASAP sets a profiling cost and picks maximum
    points within the budget

    View Slide

  111. Building Estimators vs Time Profile
    1
    2
    3
    0.5M 1M 1.5M 2M
    Runtime (min)
    No. of Estimators
    Twitter Graph
    Time complexity linear in number of estimators
    1
    2
    3
    0 0.5M 1M 1.5M 2.1M
    Runtime (min)
    Twitter Graph Profiling

    View Slide

  112. Building Estimators vs Error Profile
    0
    5
    10
    15
    20
    25
    30
    35
    40
    50k 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph
    Error complexity non-linear in number of estimators

    View Slide

  113. Building Estimators vs Error Profile
    0
    5
    10
    15
    20
    25
    30
    35
    40
    50k 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph
    Error complexity non-linear in number of estimators
    Key idea: Use a very small sample of the graph to build
    the ELP
    § Chernoff analysis provides a loose upper bound on the
    number of estimators.
    § In small graphs, a large number of estimators can get us very
    close to ground truth.

    View Slide

  114. Building Estimators vs Error Profile
    0
    5
    10
    15
    20
    25
    30
    35
    40
    50k 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph
    Error complexity non-linear in number of estimators
    0
    5
    10
    15
    20
    25
    30
    35
    40
    0 0.5m 1m 1.5m 2.1m
    Error Rate (%)
    No. of Estimators
    Twitter Graph Profiling

    View Slide

  115. Advanced Mining
    Predicate Matching
    • Find patterns where vertices are of type “electronics”
    • ASAP allows simple edge and vertex predicates
    Motif Mining
    • Some patterns are building blocks for other patterns
    • ASAP caches state of the estimators and reuses them
    Accuracy Refinement
    • Users may require more accurate answer later
    • ASAP can checkpoint and reuse estimators
    More details in the paper

    View Slide

  116. Implementation & Evaluation
    § Implemented on Apache Spark
    § Not limited to it, only relies on simple dataflow operators
    § Evaluated in a 16 node cluster
    § Twitter: 1.47B edges
    § Friendster: 1.8B edges
    § UK: 3.73B edges
    § Comparison using representative patterns:
    § 3 (2 patterns), 4 (6 patterns) and 5 motifs (21 patterns)

    View Slide

  117. Performance on Small Graphs
    12.1
    162
    291.4
    3161
    7.3
    14.9 18.1
    41.6
    1
    10
    100
    1000
    10000
    CiteSeer Mico Youtube LiveJournal
    Time (s)
    Arabesque ASAP
    4-Motifs (6 patterns)

    View Slide

  118. Performance on Small Graphs
    12.1
    162
    291.4
    3161
    7.3
    14.9 18.1
    41.6
    1
    10
    100
    1000
    10000
    CiteSeer Mico Youtube LiveJournal
    Time (s)
    Arabesque ASAP
    4-Motifs (6 patterns)

    View Slide

  119. Performance on Small Graphs
    77 x
    <5% error
    12.1
    162
    291.4
    3161
    7.3
    14.9 18.1
    41.6
    1
    10
    100
    1000
    10000
    CiteSeer Mico Youtube LiveJournal
    Time (s)
    Arabesque ASAP
    4-Motifs (6 patterns)

    View Slide

  120. Large Graphs & Simple Patterns
    645
    2.5
    5 5.9
    1
    10
    100
    1000
    0.9 1.5 1.8 3.7
    Time (min)
    # Edges (Billions)
    3-Motifs (2 patterns)
    Arabesque ASAP

    View Slide

  121. Large Graphs & Simple Patterns
    645
    2.5
    5 5.9
    1
    10
    100
    1000
    0.9 1.5 1.8 3.7
    Time (min)
    # Edges (Billions)
    Proprietary graph, 20
    machines (256GB each)
    3-Motifs (2 patterns)
    Arabesque ASAP

    View Slide

  122. Large Graphs & Simple Patterns
    645
    2.5
    5 5.9
    1
    10
    100
    1000
    0.9 1.5 1.8 3.7
    Time (min)
    # Edges (Billions)
    Proprietary graph, 20
    machines (256GB each)
    258 x
    <5% error
    3-Motifs (2 patterns)
    Twitter Friendster UK
    Arabesque ASAP

    View Slide

  123. Large Graphs & Simple Patterns
    645
    2.5
    5 5.9
    1
    10
    100
    1000
    0.9 1.5 1.8 3.7
    Time (min)
    # Edges (Billions)
    Proprietary graph, 20
    machines (256GB each)
    258 x
    <5% error
    3-Motifs (2 patterns)
    Twitter Friendster UK
    Arabesque ASAP

    View Slide

  124. Large Graphs & Complex Patterns
    4-Motifs
    22
    47
    0
    10
    20
    30
    40
    50
    Twitter UK
    Time (min)

    View Slide

  125. Large Graphs & Complex Patterns
    12.3
    22.1
    5.6
    14.2
    0
    5
    10
    15
    20
    25
    Twitter UK
    Time (min)
    5% 10%
    5-House
    4-Motifs
    22
    47
    0
    10
    20
    30
    40
    50
    Twitter UK
    Time (min)

    View Slide

  126. Summary
    § Pattern mining important & challenging problem
    § Applications in many domains
    § ASAP uses approximation for fast pattern mining
    § Leverages graph mining theory & makes it practical
    § Simple API for developers
    § ASAP outperforms existing solutions
    § Can handle much larger graphs with fewer resources
    http://www.cs.berkeley.edu/~api
    [email protected]

    View Slide