Slide 7
Slide 7 text
Initial classification
New logs processing
New logs (hour
to day)
Old clusters do not move.
A user is notified when new
cluster appears
Initial class
structure
100k - 1M records a
day
~10k signatures ~1k dense clusters
Log archive
(week)
Hundreds of user clusters
UI shows user clusters
Human classification
Vectorization in
space of
1,2,3-grams
k-means-facilitated
human overview,
population of user
clusters
Greedy clustering to
Jaccard-dense clusters
Signature
extraction.
Vectorization in
space of
1,2,3-grams
Two clusters mean the same?
Human joins them to a user
cluster and names it.
User cluster contain single of
multiple dense clusters
New 1,2,3-gram?
Add dimension, re-evaluate distances
New signature?
Calculate distance to old signatures
Is adding to existing cluster break Jaccard
compactness criterion? Then it is a new
cluster.