Slide 1

Slide 1 text

Faceting analyzed fields with some sprinkles of probability theory conjures trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture

Slide 2

Slide 2 text

Trending?

Slide 3

Slide 3 text

© Buzzcapture

Slide 4

Slide 4 text

© Buzzcapture

Slide 5

Slide 5 text

reference reference topic © Buzzcapture

Slide 6

Slide 6 text

topic reference ≠

Slide 7

Slide 7 text

topic reference P(w|T) = kDt |w 2 Dt k kDt k

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

topic reference P(w|T) = kDt |w 2 Dt k kDt k

Slide 10

Slide 10 text

P(w|T) = kDt |w 2 Dt k kDt k

Slide 11

Slide 11 text

brown dog fox quick 2 5 10 12 5 6 12 13 2 5 6 10 12 13 brown dog fox quick

Slide 12

Slide 12 text

In our index. • Terms = 12GB • “Arrows” = 41GB

Slide 13

Slide 13 text

{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter: { regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little

Slide 14

Slide 14 text

Drop docs with too many terms

Slide 15

Slide 15 text

reference reference topic © Buzzcapture

Slide 16

Slide 16 text

iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841 4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857