Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Boaz Leskes
June 04, 2013
Technology
83
0
Share
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
340
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
550
Resiliency in Elasticsearch & Lucene
bleskes
0
260
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
740
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
380
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
710
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
350
Other Decks in Technology
See All in Technology
イベントで大活躍する電子ペーパー名札 〜その3〜 / ビジュアルプログラミングIoTLT vol.23
you
PRO
0
130
freee-mcpを Local→Remote で出してわかった MCP認可実装のリアル
terara
3
630
TypeScript の型で副作用の実行順序を制御する
yanaemon
2
210
なぜハノーバーメッセに行くべきなのか 〜初参加だから語れること〜
tanakaseiya
0
100
コーディングエージェントはTypeScriptの 型エラーをどう自己修正しているのか
melonps
4
480
TSKaigi 2026 - 10秒のビルドを1秒へ:tsdownが切り拓く2026年のTypeScriptライブラリ開発
teamlab
PRO
2
260
Gradle×GitHub_ActionsでCI時間を約50%短縮 ジョブ分割の設計と落とし穴 / Cutting CI Time by ~50% with Gradle and GitHub Actions: Job-Splitting Design and Pitfalls
takatty
0
100
Geek Woman の育ち方 〜コミュニティとAIと〜
chicaco
0
410
Loadbalancing exporter internals
ymotongpoo
1
130
サプライチェーン攻撃への備えについて考えている #湘なんか
stefafafan
3
2.4k
情シスがMCP環境導入時に打ちのめされる認可の崖
oidfj
0
450
実践 TanStack Start ― 新規プロダクトを開発して確立した、サーバーとクライアント境界の設計パターン / Practical TanStack Start Server-Client Boundary Patterns
kaminashi
2
310
Featured
See All Featured
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
2k
Leo the Paperboy
mayatellez
7
1.8k
Technical Leadership for Architectural Decision Making
baasie
3
370
WENDY [Excerpt]
tessaabrams
10
37k
Docker and Python
trallard
47
3.8k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
130k
Game over? The fight for quality and originality in the time of robots
wayneb77
1
180
How to Grow Your eCommerce with AI & Automation
katarinadahlin
PRO
1
190
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
310
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
8.1k
GraphQLの誤解/rethinking-graphql
sonatard
75
12k
Groundhog Day: Seeking Process in Gaming for Health
codingconduct
0
180
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857