Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
50
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
310
Life of a Document in Elasticsearch
bleskes
3
3.2k
Resiliency in Elasticsearch & Lucene
bleskes
0
500
Resiliency in Elasticsearch & Lucene
bleskes
0
220
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
700
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
360
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
650
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
シフトライトなテスト活動を適切に行うことで、無理な開発をせず、過剰にテストせず、顧客をビックリさせないプロダクトを作り上げているお話 #RSGT2025 / Shift Right
nihonbuson
3
1.8k
20250116_JAWS_Osaka
takuyay0ne
1
150
20241218_マルチアカウント環境におけるIAM_Access_Analyzerによる権限管理.pdf
nrinetcom
PRO
3
160
Azureの開発で辛いところ
re3turn
0
220
.NET AspireでAzure Functionsやクラウドリソースを統合する
tsubakimoto_s
0
150
知っててうれしい SQL について
greendrop
0
110
駆け出しリーダーとしての第一歩〜開発チームとの新しい関わり方〜 / Beginning Journey as Team Leader
kaonavi
0
100
#TRG24 / David Cuartielles / Post Open Source
tarugoconf
0
490
.NET 最新アップデート ~ AI とクラウド時代のアプリモダナイゼーション
chack411
0
160
プロダクトの寿命を延ばすためにエンジニアが考えるべきこと 〜バージョンアップってなんのためにやるのか〜 / Strategies for product longevity
kaonavi
0
100
Copilotの力を実感!3ヶ月間の生成AI研修の試行錯誤&成功事例をご紹介。果たして得たものとは・・?
ktc_shiori
0
250
効率的な技術組織が作れる!書籍『チームトポロジー』要点まとめ
iwamot
2
200
Featured
See All Featured
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
127
18k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
3
230
How GitHub (no longer) Works
holman
312
140k
Typedesign – Prime Four
hannesfritz
40
2.5k
Why You Should Never Use an ORM
jnunemaker
PRO
54
9.1k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
3
340
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Optimizing for Happiness
mojombo
376
70k
4 Signs Your Business is Dying
shpigford
182
21k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
29
2.1k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
2
160
GitHub's CSS Performance
jonrohan
1030
460k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857