GRAIL: A Scalable Index for Reachability Queries in Very Large Graphs

GRAIL: A Scalable Index For Reachability Queries in Large Graphs

Given a directed graph , , is → ?

Reachability in a directed graph → reachability in a DAG

Approaches Hop Labeling Path based Interval Labeling

Interval Labeling on Trees = [ , ] Tree T
, Desired labeling property: ⊆ ⇒ → Candidates: Min-post labeling Pre-post labeling

Min-post Interval Labeling on Trees = [ , ] Post-order
rank of u Minimum post-order rank of all the nodes in the sub-tree rooted at u Linear time and space

Min-post Interval Labeling on Trees = [ , ] Post-order
rank of u Minimum post-order rank of all the nodes in the sub-tree rooted at u Linear time and space ⊆ ⇒ →

Min-post Interval Labeling on DAGS Do not visit a node
than once, keep the post-order rank of the first visit. ⊆ ⇒ ↛ = [ , ] Post-order rank of u Minimum post-order rank of all the nodes in the sub-tree rooted at u ⊆ ⇏ →

Min-post Interval Labeling on DAGS ⊆ ⇒ ↛ = [
, ] Assume → , = [ , ] = [ , ]

GRIPP Key Ideas Apply interval labeling to the tree. Reachable(u,
v): may trigger multiple containment queries If containment is satisfied, return true If containment is not satisfied For all non-tree edges (x, y), x is a descendant of u, If Reachable(y, v), return true Return false // out of non-tree edges Query time: O(|E| - |V|) = O(t), for t non-tree edges

GRAIL Key Ideas Apply post-order interval labeling to the DAG.
Hypothesis: Most of the reachability information is captured by interval labeling Reachable(u, v): If containment is not satisfied, return false. No false negatives If containment is satisfied, do something. May have false positives Can we decrease the number of false positives? Yes, compute many intervals. How? How do we handle false positives?

Computing The Index (Intervals) d intervals O(d(n+m)) time O(d|V|) space

Computing The Index (Intervals) Random traversal Fixed reverse pairs Bottom
up* Heuristic (guided)* During the ith traversal, select the node with the most exceptions in the previous i – 1 traversals Too expensive Use some heuristics to select nodes with many possible exceptions Time complexity: O(d(n+m) + dn(Plog(P)), P = maximum out degree

Computing The Index (Intervals) Heuristic (guided)* During the ith traversal,
select the node with the most exceptions in the previous i – 1 traversals Too expensive Maximum Volume Maximum minimum interval Maximum adjusted volume Maximum adjusted minimum interval

Answering Queries Else If ⊆ If ⊆ return ↛ •
Exception Lists, or • Pruned DFS

Pruned DFS

Exception Lists

Computing Exception Lists Direct exceptions Indirect exceptions

Computing Exception Lists Direct exceptions (first dimension) Interval query: O(logN
+ K) O(nP(logn + X)) P: maximum out degree X: maximum number of exceptions

Computing Exception Lists Indirect exceptions (first dimension) O(nXP2) P: maximum
out degree X: maximum number of exceptions

Computing Exception Lists Subsequent dimensions O(nXP’) P’: maximum in degree,
X: maximum number of exceptions

Computing Exception Lists Direct exceptions Indirect exceptions O(nXP(logn + X
+ P)) P: maximum in/out degree X: maximum number of exceptions

Optimizations Topological sorting Search strategies: DFS, BFS, BBFS Positive cut
filter

Optimizations: Positive cut filter

Datasets

Index Construction Time (ms): Small Sparse

Index Size (#entries): Small Sparse

Query Time (ms): Small Sparse

Index Construction Time (ms): Small Dense

Index Size (#entries): Small Dense

Query Time (ms): Small Dense

Index Construction Time (ms) and Size: Large Real

Query Time (ms): Large Real

Baseline Methods: Random Queries

Baseline Methods: Positive Queries

GRAIL Optimizations

GRAIL Query Times for Positive Queries

Effect of Exceptions

Query Times for different number of traversals

Query Times for different Traversals

Number of exceptions remaining for different traversals

GRAIL: A Scalable Index for Reachability Querie...

GRAIL: A Scalable Index for Reachability Queries in Very Large Graphs

More Decks by Emaad Manzoor

Other Decks in Science

Featured

Transcript