Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GRAIL: A Scalable Index for Reachability Queries in Very Large Graphs

GRAIL: A Scalable Index for Reachability Queries in Very Large Graphs

90-minute presentation for the Advanced topics in Data Management course at KAUST, on "GRAIL: A Scalable Index for Reachability Queries in Very Large Graphs" by Hilmi Yıldırım et al, published in VLDBJ '11.

Emaad Manzoor

March 09, 2014
Tweet

More Decks by Emaad Manzoor

Other Decks in Science

Transcript

  1. Interval Labeling on Trees = [ , ] Tree T

    , Desired labeling property: ⊆ ⇒ → Candidates: Min-post labeling Pre-post labeling
  2. Min-post Interval Labeling on Trees = [ , ] Post-order

    rank of u Minimum post-order rank of all the nodes in the sub-tree rooted at u Linear time and space
  3. Min-post Interval Labeling on Trees = [ , ] Post-order

    rank of u Minimum post-order rank of all the nodes in the sub-tree rooted at u Linear time and space ⊆ ⇒ →
  4. Min-post Interval Labeling on DAGS Do not visit a node

    than once, keep the post-order rank of the first visit. ⊆ ⇒ ↛ = [ , ] Post-order rank of u Minimum post-order rank of all the nodes in the sub-tree rooted at u ⊆ ⇏ →
  5. Min-post Interval Labeling on DAGS ⊆ ⇒ ↛ = [

    , ] Assume → , = [ , ] = [ , ]
  6. GRIPP Key Ideas Apply interval labeling to the tree. Reachable(u,

    v): may trigger multiple containment queries If containment is satisfied, return true If containment is not satisfied For all non-tree edges (x, y), x is a descendant of u, If Reachable(y, v), return true Return false // out of non-tree edges Query time: O(|E| - |V|) = O(t), for t non-tree edges
  7. GRAIL Key Ideas Apply post-order interval labeling to the DAG.

    Hypothesis: Most of the reachability information is captured by interval labeling Reachable(u, v): If containment is not satisfied, return false. No false negatives If containment is satisfied, do something. May have false positives Can we decrease the number of false positives? Yes, compute many intervals. How? How do we handle false positives?
  8. Computing The Index (Intervals) Random traversal Fixed reverse pairs Bottom

    up* Heuristic (guided)* During the ith traversal, select the node with the most exceptions in the previous i – 1 traversals Too expensive Use some heuristics to select nodes with many possible exceptions Time complexity: O(d(n+m) + dn(Plog(P)), P = maximum out degree
  9. Computing The Index (Intervals) Heuristic (guided)* During the ith traversal,

    select the node with the most exceptions in the previous i – 1 traversals Too expensive Maximum Volume Maximum minimum interval Maximum adjusted volume Maximum adjusted minimum interval
  10. Answering Queries Else If ⊆ If ⊆ return ↛ •

    Exception Lists, or • Pruned DFS
  11. Computing Exception Lists Direct exceptions (first dimension) Interval query: O(logN

    + K) O(nP(logn + X)) P: maximum out degree X: maximum number of exceptions
  12. Computing Exception Lists Direct exceptions Indirect exceptions O(nXP(logn + X

    + P)) P: maximum in/out degree X: maximum number of exceptions