Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
59
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
320
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
510
Resiliency in Elasticsearch & Lucene
bleskes
0
230
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
710
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
360
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
670
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
Whats_new_in_Podman_and_CRI-O_2025-06
orimanabu
3
170
ゆるSRE #11 LT
okaru
1
590
Go Connectへの想い
chiroruxx
0
160
脅威をモデリングしてMCPのセキュリティ対策を考えよう
flatt_security
4
1.5k
“プロダクトを好きになれるか“も QAエンジニア転職の大事な判断基準だと思ったの
tomodakengo
0
120
Text-to-SQLの評価データセットを作って最新LLMモデルの性能評価をしてみた
gotalab555
3
770
Eight Engineering Unit 紹介資料
sansan33
PRO
0
3.4k
型システムを知りたい人のための型検査器作成入門
mame
14
3.6k
マルチテナント+マルチプロダクト SaaS への AI Agent の組み込み方
kworkdev
PRO
2
300
Securing your Lambda 101
chillzprezi
0
240
AWS と定理証明 〜ポリシー言語 Cedar 開発の舞台裏〜 #fp_matsuri / FP Matsuri 2025
ytaka23
9
2.4k
CIでのgolangci-lintの実行を約90%削減した話
kazukihayase
0
140
Featured
See All Featured
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.6k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
900
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Gamification - CAS2011
davidbonilla
81
5.3k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.8k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
35
2.3k
Documentation Writing (for coders)
carmenintech
71
4.9k
Designing for humans not robots
tammielis
253
25k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
780
Build your cross-platform service in a week with App Engine
jlugia
231
18k
The Cult of Friendly URLs
andyhume
79
6.4k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857