Slide 9
Slide 9 text
Similarity-based clustering
• Both agglomerative and divisive clustering methods require a document-document
similarity measure, sim(d1, d2)
• In particular, the similarity measure needs to be
◦ symmetric: sim(d1
, d2
) = sim(d2
, d1
)
◦ normalized: sim(d1
, d2
) ∈ [0, 1]
• The choice of similarity measure is closely tied with how documents are
represented
9 / 23