Jean-Gabriel Young
July 13, 2016
170

# Spectral clustering of graphs

2016 CRM Summer School on Spectral Theory:Spectral Theory and Applications

July 13, 2016

## Transcript

1. S
CRM S S
S T A
J.-G. Young
July ,
Département de physique, de génie physique, et d’optique

2. Content
. Motivations
. Bisection : the spectral method
. General case : graph clustering
. Two experiments
. Conclusion

3. M

4. Graph clustering
F
For the vertex set V(G) of an undirected graph G(V, E) :
We want to identify the partition B(V) of V(G) which opti-
mizes an objective function
f : B, G → R
over the set of all partitions B(V).

5. Search space of graph clustering
Major hurdle :
dependency of the number of solutions on N |V|.
0 5 10 15 20
N
100
102
104
106
108
1010
1012
1014
1016
1018
1020
1022
1024
1026
Count
Number of partitions of N nodes in g=2 blocks of equal sizes
Number of partitions of N nodes in g=2 blocks
Number of partitions of N nodes.
Number of partitions of N nodes (upper bound)
This is true even if the search space is heavily constrained.

6. Hardness of clustering
( ) NP-H
P : Problems solvable in polynomial time (easy)
NP : Problems solvable in non-deterministic polynomial
time (hard)
NP-C : Equivalence class of NP (hard)
NP-H : Problems which are at least as hard as the
hardest problem in NP-C (hardest)

7. B :

8. Matrix formulation of graph bisection ( of )
We will consider objective functions of the form
f ({σi
}, G)
ij
h(in)
ij
(G)δσi
σj
+
ij
h(out)
ij
(G) ¯
δσi
σj
ij
[h(in)
ij
(G) − h(out)
ij
(G)]δσi
σj
.
D
δij : Kronecker delta.
σi : index of the block of vertex vi ∈ V.
h(in)
ij
: cost associated to putting vi, vj in the same block.
h(out)
ij
: cost associated to putting vi, vj in diﬀerent blocks.

9. Matrix formulation of graph bisection ( of )
In , either vi ∈ B or vi ∈ B .
We denote this with indicator variables si ± . Then
si sj

if σi
σj
− otherwise
.
and δσi
σj
≡ (si sj
+ )/ .

10. Matrix formulation of graph bisection ( of )
In , either vi ∈ B or vi ∈ B .
We denote this with indicator variables si ± . Then
si sj

if σi
σj
− otherwise
.
and δσi
σj
≡ (si sj
+ )/ .
Deﬁning the indicator vector s and objective matrix H, we rewrite
the objective function as the
f ({σi
}, G) ≡ sT Hs + C .

11. Example of a objective matrix ( of )
F – Barbell graph B(n , n ) with n , n

12. Example of a objective matrix ( of )
C L
A is the adjacency matrix of G
D is the diagonal matrix of the degrees ki
N
j
aij
L : D − A.
0 5 10 15
0
5
10
15
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 5 10 15 20
0
5
10
15
20
101
0 5 10 15 20
0
5
10
15
20
0.0
1.5
3.0
4.5
6.0
7.5
9.0
10.5

13. Example of a objective matrix ( of )
The combinatorial Laplacian counts the number of edges
between blocks.
fLap sT Ls sT Ds − sTAs
Deﬁne m(B , B ) as the number of edges between blocks B , B :
sT Ds
N
i
ki s
i
m
,
sTAs
m(B , B ) + m(B , B ) − m(B , B )
.

14. Example of a objective matrix ( of )
F – Barbell graph B(n , n ) with n , n

15. Example of a objective matrix ( of )
F – Barbell graph B(n , n ) with n , n

16. Example of a objective matrix ( of )
O
Normalized Laplacians Lsym D− / LD− / , Lrw D− LD
Modularity Q A − A H

17. Constrained bisection ( of )
F – Modiﬁed Barbell graph.

18. Constrained bisection ( of )
U
Optimize f ({σi
}, G) sT Hs subject to s ∈ {− , }N .
B partitions are often desirable. Unconstrained
bisection does not ask for balance.

19. Constrained bisection ( of )
U
Optimize f ({σi
}, G) sT Hs subject to s ∈ {− , }N .
B partitions are often desirable. Unconstrained
bisection does not ask for balance.
∃ two methods to constrains B {B , B } :
. Modify f .

20. Constrained bisection ( of )
O : M f
fLap :
fLap
|B ||B |
,
¯
fLap :
fLap
vol(B )vol(B )
.
O : E
Optimize f ({σi
}, G) sT Hs
subject to s ∈ {− , }N and sT ≤ , with ≥ .

21. Constrained bisection ( of )
U ( )
Optimize f ({σi
}, G) sT Hs subject to s ∈ {− , }N .

22. Constrained bisection ( of )
U ( )
Optimize f ({σi
}, G) sT Hs subject to s ∈ {− , }N .
C ( )
Optimize f ({σi
}, G) sT Hs
subject to s ∈ {− , }N and sT ≤ , with ≥ .

23. Spectral algorithm for graph bisection ( of )
C
Optimize f ({σi
}, G) sT Hs
subject to s ∈ {− , }N and sT ≤ , with ≥ .
Dropping constraints turns bisection into an easy problem

24. Spectral algorithm for graph bisection ( of )
C
Optimize f ({σi
}, G) xT Hx
x ∈ RN and xT ≤ , with ≥ .
Dropping constraints turns bisection into an easy problem.

25. Spectral algorithm for graph bisection ( of )
Justiﬁcation : Suppose that xi ∈ RN is a normalized eigenvector
of H with eigenvalue λi. Then
f xT
i
Hxi
λi xT
i
xi
λi
If we have ordered eigenvectors (accounting for multiplicities),
λ ≤ λ ≤ ... ≤ λN
⇒ the optima of f correspond to extremal eigenvalues.

26. Spectral algorithm for graph bisection ( of )
The continuous optimization perspective
f xT
i
Hxi
ij
hij xi xj
of f are found by setting {∂xi
[ f ]} to zero.
We avoid trivial solutions xi ∀i, by asking i
x
i
∆, ∆ >

∂xr

ij
hij xi xj − λ
i
xi − ∆

(∆ > )
Using ∂xr
[xi
] δir, we ﬁnd that
j
Hij xj
λxi ⇔ Hx λx

27. Spectral algorithm for graph bisection ( of )
We have relaxed s → x.
How do we recover s ?

28. Spectral algorithm for graph bisection ( of )
We have relaxed s → x.
How do we recover s ?
In the , we can show that the sign of xi ∈ x is a
good predictor of the nearest s.

29. Spectral algorithm for graph bisection ( of )
We have relaxed s → x.
How do we recover s ?
In the , we can show that the sign of xi ∈ x is a
good predictor of the nearest s.
In general, we can use K-Means to minimize
argminB
K
r i∈Br
||xi − µr
||
I : Reject solutions that do not satisfy xT ≤ .

30. Concrete examples ( of )
M B
0 5 10 15 20
λ
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 5 10 15 20
i
0.3
0.2
0.1
0.0
0.1
0.2
0.3
xi
0.6 0.4 0.20.0 0.2 0.4 0.6
xi
0
1
2
3
4
5
6
7
8
9
Density
F – (left) Eigenvalue density of L. (middle) Values of the elements
of x (right) Distribution of the elements of x in R .

31. Concrete examples ( of )
P ( )
0 5 10 15 20
λ
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0 5 10 15 20 25 30 35 40
i
0.3
0.2
0.1
0.0
0.1
0.2
0.3
xi
0.4 0.2 0.0 0.2 0.4
xi
0
2
4
6
8
10
12
14
Density
F – (left) Eigenvalue density of L. (middle) Values of the elements
of x (right) Distribution of the elements of x in R .

32. Concrete examples ( of )
P ( )
0 5 10 15 20 25
λ
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0 5 10 15 20 25 30 35 40
i
0.4
0.2
0.0
0.2
0.4
0.6
xi
0.4 0.2 0.0 0.2 0.4
xi
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Density
F – (left) Eigenvalue density of L. (middle) Values of the elements
of x (right) Distribution of the elements of x in R .

33. G :

34. Matrix formulation of graph clustering ( of )
Recall that we optimize objective functions of the form
f ({σi
}, G)
ij
[h(in)
ij
(G) − h(out)
ij
(G)]δσi
σj
.
If the partition has g ≥ block B, we must use indicator
to represent δσi
σj
.
s
s s s
s
s
s
s
s
F – Corners of regular (g − )-simplices.

35. Matrix formulation of graph clustering ( of )
The indicator vectors satisfy
sT
i
si

g
if σi
σj

g
otherwise
. ( )
f ({σi
}, G)
ij
[h(in)
ij
(G) − h(out)
ij
(G)]δσi
σj
Tr ST HS + C
S is the N × g − matrix with vector si on row i.

36. C
Optimize f ({σi
}, G) Tr(ST Hs)
subject to S ∈ (g − )-dimensional simplex
and
ST ≤ , with ≥ .
s
s s s
s
s
s
s
s

37. Optimal solutions
Suppose that X is the matrix of the eigenvectors of H such that
HX XΛ
where Λ is the diagonal matrix of eigenvalues.
We see
f Tr XT HX Tr XT XΛ
g−
i
λi
⇒ the optima of f are given by sums of extremal eigenvalues.

38. Continuous optimization perspective
of f are found by setting {∂Xrs
[ f ]} to zero.
f Tr XT HX
We avoid trivial solutions Xrs ∀i, by asking XT X ∆I

∂X
Tr XT HX − Tr X(Λ + ∆I)XT (∆ > )
HX XΛ
Because we have the identities

∂X
Tr(XTAX) (A + AT)X

∂X
Tr(XAXT) X A + AT

39. Spectral clustering algorithm
Input : number of blocks g, objective matrix H, tolerance
. Compute the g largest (smallest) eigenvalues of H
. Construct the N × (g − ) matrix of eigenvectors X
. Verify that XT ≤ (element-wise).
If not, replace the faulty eigenvector.
. Cluster the elements of X in Rg− with K-Means
(K g − ).
Return : The clusters in Rg− .

40. Example
P ( )
0 2 4 6 8 10 12 14 16
λ
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.3 0.2 0.1 0.0 0.1 0.2 0.3
x(1)
i
0.2
0.1
0.0
0.1
0.2
0.3
x(2)
i
F – (left) Eigenvalue density of L. (right) Elements of x versus
the element of x in R .

41. The number of clusters
The optimal nb. of blocks is predicted by the
∆i
(λi+ − λi
)
N
0 5 10 15 20
i
0.00
0.05
0.10
0.15
0.20
(λi+1 − λi)/N
Eigengap of C(3,4)
0 5 10 15 20
i
0.00
0.05
0.10
0.15
0.20
(λi+1 − λi)/N
Eigengap of C(4,4)
0 5 10 15 20
i
0.00
0.05
0.10
0.15
0.20
(λi+1 − λi)/N
Eigengap of C(5,4)
0 5 10 15 20
i
0.00
0.05
0.10
0.15
0.20
(λi+1 − λi)/N
Eigengap of C(6,4)
F – (left) Caveman graph C( , ) (right) Eigengap of C( , ),
, .., .

42. T

43. Zachary Karate Club
0 2 4 6 8 10
i
0.000
0.005
0.010
0.015
0.020
0.025
0.030
(λi+1 − λi)/N
Eigengap of Zachary's Karate Club
F – (left) Graph of interactions (right) Statistics of the eigengap
[Laplacian matrix].

44. Political blogs
0 500 1000
Vertex
Eigenvector element
Dataset : L. A. Adamic and N. Glance, The political blogosphere ( )
Figures : M.E.J. Newman, Spectral methods for network community detection and
graph partitioning, ( )

45. C

46. Take home message
Constrained clustering is hard (NP-H )
Relaxing the discrete constraint ⇒ spectral algorithm
The spectral approach arises from the continuous
optimization perspective
The framework is general, arbitrary H.

47. Supplementary Material
The slides, lecture notes and python notebook are online at
www.jgyoung.ca/crm2016/