Slide 5
Slide 5 text
1.
WORD CLUSTERING
INTRODUCTION
- Word clustering is a technique for partitioning sets of words into
subsets of semantically similar words
- Suppose we have set of words W = w$
,w&
, … , w(
, n ∈ ℕ , our goal is
to find C = C$
,C&
, …, C.
, k ∈ ℕ where
- w1
is a centroid of cluster C2
- similarity w1
,w is a function to measure the similarity score
- and is a threshold value where if D
, ≥ means that
D
and is semantically similar.
- For $
∈ G
and &
∈ H
apply that $
, &
< , so
J
= ∀ ∈ where D
, ≥ }
G
∩ H
= ∅, ∀G
,H
∈