"Community structure in social and biological networks." Proceedings of the National Academy of Sciences 99.12 (2002): 7821-7826. Metis: Karypis, George, and Vipin Kumar. Multilevel graph partitioning schemes. ICPP (3). 1995. Metis+MQI: Lang, Kevin, and Satish Rao. A flow-based method for improving the expansion or conductance of graph cuts. Integer Programming and Combinatorial Optimization. Springer Berlin Heidelberg, 2004. 325-337. Surveys Leskovec, Jure, Kevin J. Lang, and Michael Mahoney. Empirical comparison of algorithms for network community detection. Proceedings of the 19th international conference on World wide web. ACM, 2010. Fortunato, Santo. Community detection in graphs. Physics Reports 486.3 (2010): 75-174.
Network Communities/Clusters Graph G(V,E) Community = The quality or number of links amongst members of V’ should be better or more than that between V’ and V - V’ V ' ⊂ V
Network Communities/Clusters Community detection requires that the graph be sparsely connected A densely connected graph cannot intuitively be split into communities
Applications Online Ad Exchanges Cluster advertiser-keyword graphs Recommend keywords for advertisers to bid on Deepayan Chakrabarti. "Clustering Applications at Yahoo!”. http://www.slideserve.com/Gabriel/clustering-applications-at-yahoo Retrieved November 3, 2013
Applications The Dolphin Network Doubtful Sound, NZ D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, Behavioral Ecology and Sociobiology 54, 396-405 (2003).
Finding Communities/Clusters Define an objective function for the network Quantifies the quality of a network partitioning Optimize the partitioning with the community detection algorithm Typically NP-Hard to optimize
Girvan & Newman, 2002 Divisive Hierarchical Clustering Detect edges that connect communities and remove them Based on edge betweenness Number of shortest paths through this edge Edge Betweenness Figure 10: Fortunato, Santo. Community detection in graphs. Physics Reports 486.3 (2010): 75-174.
Girvan & Newman, 2002 1. Compute betweenness for all edges 2. Remove edge with the largest betweenness 3. Recalculate betweenness for all edges 4. Go to 2. Hierarchical Graph Figure 7: Fortunato, Santo. Community detection in graphs. Physics Reports 486.3 (2010): 75-174.
Girvan & Newman, 2002 Output Dendrogram Girvan, Michelle, and Mark EJ Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99.12 (2002): 7821-7826.
Girvan & Newman, 2002 Issues Edge betweenness is slow to compute ( using Newman’s algorithm) Cannot detect overlapping communities Hierarchy may not make sense O(V E )
Algorithms Graph Partitioning Minimize a function of the cut size of the partitioning Requires number of clusters a priori Requires cluster size a priori Kernighan-Lin, Spectral Partitioning, Multilevel algorithms
Measuring Partition Quality Intuitively Ratio of the number of edges leaving the cluster to the number of edges inside it Lower is better Figure 1: Leskovec, Jure, Kevin J. Lang, and Michael Mahoney. "Empirical comparison of algorithms for network community detection." Proceedings of the 19th international conference on World wide web. ACM, 2010. φ(A) = 2 6 φ(B) = 1 5
Measuring Partition Quality Intuitively But this would encourage large clusters that include most vertices in the graph We need to penalize over-large/small clusters by normalizing with the component size Figure 1: Leskovec, Jure, Kevin J. Lang, and Michael Mahoney. "Empirical comparison of algorithms for network community detection." Proceedings of the 19th international conference on World wide web. ACM, 2010.
Measuring Partition Quality Conductance of a graph cut How community-like is a cluster S? = cut size, number of edges leaving S Figure 1: Leskovec, Jure, Kevin J. Lang, and Michael Mahoney. "Empirical comparison of algorithms for network community detection." Proceedings of the 19th international conference on World wide web. ACM, 2010. C s Vol(S) = degree(u) u∈S ∑
Measuring Partition Quality Conductance of a graph cut Penalizes over-large clusters is small Penalizes extra-small clusters is small Figure 1: Leskovec, Jure, Kevin J. Lang, and Michael Mahoney. "Empirical comparison of algorithms for network community detection." Proceedings of the 19th international conference on World wide web. ACM, 2010. φ(S) = C s min(Vol(S),Vol(V − S)) Vol(S) Vol(V − S)
Measuring Partition Quality Conductance of a graph cut Note that Figure 1: Leskovec, Jure, Kevin J. Lang, and Michael Mahoney. "Empirical comparison of algorithms for network community detection." Proceedings of the 19th international conference on World wide web. ACM, 2010. φ(A) = 2 14 φ(B) = 1 11 φ(S) = φ(V − S)
Measuring Partition Quality Conductance is one of many tradeoff metrics Expansion There are also hard-balance constraints 50-50 vertex bipartition (Metis) φ(S) = C s min( S ), V − S )
Network Community Profile Characterizes the quality of network communities as a function of their size If = cluster quality (eg. conductance) is the quality of the best cluster having exactly vertices Figure 1: Leskovec, Jure, Kevin J. Lang, and Michael Mahoney. "Empirical comparison of algorithms for network community detection." Proceedings of the 19th international conference on World wide web. ACM, 2010. f (S) φ(k) = min S =k f (S) 1≤ k ≤ V 2 φ(k) k
Other Applications of Graph Partitioning Low-level Vision Segmentation Restoration Figure 5: Boykov, Yuri, and Vladimir Kolmogorov. "An experimental comparison of min-cut/ max-flow algorithms for energy minimization in vision." Pattern Analysis and Machine Intelligence, IEEE Transactions on 26.9 (2004): 1124-1137.
Other Applications of Graph Partitioning Distributed Systems Partition data while minimizing communication overhead Khayyat, Zuhair, et al. Mizan: a system for dynamic load balancing in large-scale graph processing. Proceedings of the 8th ACM European Conference on Computer Systems. ACM, 2013.
METIS G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel Hypergraph Partitioning: Applications in the VLSI Domain. Presentation at the University of Minnesota.
METIS Coarsening Phase. is transformed into a sequence of smaller graphs such that Partitioning Phase. A 2-way partition of the graph is computed that partitions into 2 parts, each containing half the vertices of Uncoarsening Phase. is projected back to by going through intermediate partitions G 0 G 1 ,G 2 ,...,G m V 0 > V 1 >... > V m P m V m G m G 0 G 0 P m−1 , P m−2 ,..., P 0 P m
METIS Coarsening Condense multiple nodes in to form multinode v of Weight of v = sum of weights of vertices Edges of v are the union of edges of Multiedges are combined into one, with weight equal to the sum of the component edge weights V i v G i G i+1 V i v V i v
METIS Matching A set of edges, no two of which are incident on the same vertex A maximal size matching contains all possible edges, no two of which are incident on the same vertex Matching. Retreived from http://www.cs.indiana.edu/ classes/b673/notes/ GraphPartitioning.pdf
METIS Matching Randomized Matching Select vertices in random order If a vertex u has not been matched yet, randomly select one of its unmatched adjacent vertices If such a vertex v exists, add edge (u,v) to the matching and mark u and v as matched
METIS Matching Heavy Edge Matching Select a matching that has the maximum sum of edge weights, to minimize the number of coarsening levels Heuristic algorithm (no guarantees, good in practice) Randomly select u as before, but select the unmatched adjacent vertex v such that (u,v) has maximum weight among all such v’s.
METIS Matching Light Edge Matching Results in coarse graphs of higher average degree Such graphs are easier to partition with certain heuristics like Kernighan-Lin
METIS Matching. Retreived from http://www.cs.indiana.edu/classes/ b673/notes/GraphPartitioning.pdf Coarsening Which vertices do we combine? Coarsen using maximal size matchings
METIS Partitioning Phase Compute a minimum edge-cut bisection of the coarse graph, such that each part contains roughly half the weight of the original graph Use any high-quality partitioning algorithm on the coarse graph Since the size of this graph is small, it doesn’t take much time
METIS Partitioning Phase Can also employ graph growing heuristics for partitioning Randomly select a vertex, grow a region around it using BFS until half the total vertex-weight has been included Randomly select a vertex, grow a region around it by selected vertices that lead to a smaller increase in the edge cut Use multiple trials with different initial vertices
METIS Uncoarsening Phase Every multinode in contains a distinct subset of nodes from Obtain from by simply assigning the nodes collapsed to v to the partition Since has more degrees of freedom, we can refine the partitions G i G i+1 P i ∈ G i P i+1 P i+1 [v] G i P i
METIS Uncoarsening Phase Refining Partitions Select 2 subsets of vertices, one from each part Swapping these vertices should result in a partition with smaller cut size
METIS Uncoarsening Phase Refining Partitions: Based on Kernighan-Lin partitioning Computes a gain for every vertex, the decrease/increase in cut size if the vertex is moved to the other partition In each iteration, move out the vertex with largest gain from the larger part, and mark it as used Terminate when x number of vertex moves do not decrease the cut size
METIS Uncoarsening Phase Refining Partitions: Based on Kernighan-Lin partitioning Kernighan-Lin is effective in finding locally optimal partitionings when it starts with a fairly good initial partition Terminates in a few iterations in practice
MQI Max-flow Quotient cut Improvement Optimizes the expansion (a quotient cut metric) of the graph For a given cut (A, B), finds the best improvement among all cuts (A’, B’) such that φ(S) = C s min( S ), V − S ) A' ⊂ A
MQI There is an exact polynomial-time algorithm that solves this Uriel Feige and Robert Krauthgamer, A polylogarithmic approximation of the minimum bisection, FOCS-2000. Chris Harrelson, Kirsten Hildrum, and Satish Rao, A polynomial-time tree decomposition to minimize congestion, SPAA 2003.
MQI A call to MQI returns an improved quotient cut, if it exists We can find the best reachable improved quotient cut by repeatedly feeding the output of MQI back to itself However Finding any cut whose small side (A’) contains the small side of the global best quotient cut is NP-Hard
Metis + MQI MQI always reduces the balance of a partition A maximally balanced partition will be a good initial cut Use Metis to provide this balanced partition
MQI Convert this to an S-T max flow problem Solve to obtain the total max-flow, and max-flow in each edge in near-linear time with hi_pr Boris V. Cherkassky and Andrew V. Goldberg. On implementing the push- relabel method for the maximum flow problem. Algorithmica, 19(4):390–410, 1997. Use the S-T problem and solution to obtain the improved cut
MQI Convert this to an S-T max flow problem 1. Discard all B-‐side nodes 2. Discard every edge that used to connect a pair of B-‐side nodes 3. Replace every edge that used to connect a pair of A-‐side nodes, with a pair of directed edges in each direction, with capacity a 4. Add a source S and sink T 5. Discard each node that used to connect a B-‐side node with an A-‐side node x, replacing it with a directed edge from S to x, with capacity a 6. Add a single directed edge from every A-‐side node to T, with capacity c
MQI Convert this to an S-T max flow problem Figure 1: Lang, Kevin, and Satish Rao. A flow-based method for improving the expansion or conductance of graph cuts. Integer Programming and Combinatorial Optimization. Springer Berlin Heidelberg, 2004. 325-337.
MQI Given Input graph Initial quotient cut (A,B) (with |A| <= |B|) The constructed max-flow solution Theorem There is an improved quotient cut (A’, B’), if and only if the maximum flow < ca A' ⊂ A
MQI Proof (Forward) Assume the improved quotient cut (A’, B’) exists. We can show that maximum flow is < ca If |A’| = a’, |B’| = b’, c’ = improved cut size c’/a’ < c/a => c’a < a’c (1)
MQI Proof (Forward) c’a < a’c (1) Net flow into non-sink edges = c’a (2) Flow required to saturate sink edges = a’c (3) A sink edge is unsaturated Total flow in A < ca
MQI Proof (Backward) Assume max-flow < ca, we can construct a new cut that has an improved quotient cut score Total capacity of all sink edges = ca Total flow < ca So at least one edge is unsaturated
MQI Proof (Backward) Perform a backwards directed DFS from the sink, moving along an edge only if it is unsaturated The vertices reachable this way are A’ Let |A’| = a’, then a’ > 0
MQI Proof (Backward) Also, every node feeding into A’ from outside A must be saturated If not, we could reach that node via our backward DFS and add it to A’ If there are c’ such edges, flow into A’ F i = c'a
MQI Proof (Backward) Since flow is conserved, all flow out of A’ must be carried by the sink edges F i = F s F i = c'a F s < a'c ⇒ c'a < a'c ⇒ c' a < c a
Metis + MQI The empirical complexity of multi-try Metis +MQI will be linear, because Metis ~ linear in practice Max-flow solver hi_pr ~ linear in practice MQI loop ~ sublogarithmic in practice