×
Copy
Open
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Faceting analyzed fields with some sprinkles of probability theory conjures trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Slide 2
Slide 2 text
Trending?
Slide 3
Slide 3 text
© Buzzcapture
Slide 4
Slide 4 text
© Buzzcapture
Slide 5
Slide 5 text
reference reference topic © Buzzcapture
Slide 6
Slide 6 text
topic reference ≠
Slide 7
Slide 7 text
topic reference P(w|T) = kDt |w 2 Dt k kDt k
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
topic reference P(w|T) = kDt |w 2 Dt k kDt k
Slide 10
Slide 10 text
P(w|T) = kDt |w 2 Dt k kDt k
Slide 11
Slide 11 text
brown dog fox quick 2 5 10 12 5 6 12 13 2 5 6 10 12 13 brown dog fox quick
Slide 12
Slide 12 text
In our index. • Terms = 12GB • “Arrows” = 41GB
Slide 13
Slide 13 text
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter: { regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Slide 14
Slide 14 text
Drop docs with too many terms
Slide 15
Slide 15 text
reference reference topic © Buzzcapture
Slide 16
Slide 16 text
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841 4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857