Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
63
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
320
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
510
Resiliency in Elasticsearch & Lucene
bleskes
0
230
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
710
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
680
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
なぜSaaSがMCPサーバーをサービス提供するのか?
sansantech
PRO
8
2.7k
Agile PBL at New Grads Trainings
kawaguti
PRO
1
340
[RSJ25] Feasible RAG: Hierarchical Multimodal Retrieval with Feasibility-Aware Embodied Memory for Mobile Manipulation
keio_smilab
PRO
0
120
なぜテストマネージャの視点が 必要なのか? 〜 一歩先へ進むために 〜
moritamasami
0
200
シークレット管理だけじゃない!HashiCorp Vault でデータ暗号化をしよう / Beyond Secret Management! Let's Encrypt Data with HashiCorp Vault
nnstt1
3
230
職種の壁を溶かして開発サイクルを高速に回す~情報透明性と職種越境から考えるAIフレンドリーな職種間連携~
daitasu
0
100
今!ソフトウェアエンジニアがハードウェアに手を出すには
mackee
11
4.4k
2025年になってもまだMySQLが好き
yoku0825
8
4.3k
バイブスに「型」を!Kent Beckに学ぶ、AI時代のテスト駆動開発
amixedcolor
2
390
Webブラウザ向け動画配信プレイヤーの 大規模リプレイスから得た知見と学び
yud0uhu
0
220
未経験者・初心者に贈る!40分でわかるAndroidアプリ開発の今と大事なポイント
operando
2
160
オブザーバビリティが広げる AIOps の世界 / The World of AIOps Expanded by Observability
aoto
PRO
0
320
Featured
See All Featured
The Language of Interfaces
destraynor
161
25k
Become a Pro
speakerdeck
PRO
29
5.5k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
6k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
840
Making the Leap to Tech Lead
cromwellryan
135
9.5k
Gamification - CAS2011
davidbonilla
81
5.4k
jQuery: Nuts, Bolts and Bling
dougneiner
64
7.9k
Scaling GitHub
holman
463
140k
Producing Creativity
orderedlist
PRO
347
40k
A designer walks into a library…
pauljervisheath
207
24k
Being A Developer After 40
akosma
90
590k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857