Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
49
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
310
Life of a Document in Elasticsearch
bleskes
3
3.2k
Resiliency in Elasticsearch & Lucene
bleskes
0
500
Resiliency in Elasticsearch & Lucene
bleskes
0
220
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
690
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
360
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
650
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
KubeCon NA 2024 Recap: How to Move from Ingress to Gateway API with Minimal Hassle
ysakotch
0
200
宇宙ベンチャーにおける最近の情シス取り組みについて
axelmizu
0
110
LINEヤフーのフロントエンド組織・体制の紹介【24年12月】
lycorp_recruit_jp
0
530
プロダクト開発を加速させるためのQA文化の築き方 / How to build QA culture to accelerate product development
mii3king
1
260
podman_update_2024-12
orimanabu
1
270
PHP ユーザのための OpenTelemetry 入門 / phpcon2024-opentelemetry
shin1x1
1
190
マルチプロダクト開発の現場でAWS Security Hubを1年以上運用して得た教訓
muziyoshiz
2
2.2k
re:Invent 2024 Innovation Talks(NET201)で語られた大切なこと
shotashiratori
0
310
ガバメントクラウドのセキュリティ対策事例について
fujisawaryohei
0
530
KubeCon NA 2024 Recap / Running WebAssembly (Wasm) Workloads Side-by-Side with Container Workloads
z63d
1
240
[Ruby] Develop a Morse Code Learning Gem & Beep from Strings
oguressive
1
150
C++26 エラー性動作
faithandbrave
2
730
Featured
See All Featured
Optimizing for Happiness
mojombo
376
70k
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
507
140k
Six Lessons from altMBA
skipperchong
27
3.5k
Large-scale JavaScript Application Architecture
addyosmani
510
110k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
95
17k
StorybookのUI Testing Handbookを読んだ
zakiyama
27
5.3k
Fireside Chat
paigeccino
34
3.1k
Building Your Own Lightsaber
phodgson
103
6.1k
Bootstrapping a Software Product
garrettdimon
PRO
305
110k
Fashionably flexible responsive web design (full day workshop)
malarkey
405
66k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
0
97
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857