Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Boaz Leskes
June 04, 2013
Technology
0
70
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
330
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
530
Resiliency in Elasticsearch & Lucene
bleskes
0
240
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
730
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
690
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
330
Other Decks in Technology
See All in Technology
外部キー制約の知っておいて欲しいこと - RDBMSを正しく使うために必要なこと / FOREIGN KEY Night
soudai
PRO
12
5.6k
[JAWS-UG彩の国埼玉#6]混乱しました。AWS MCP ServersとAWS MCP Serverの違いを5分で解説
sh_fk2
0
100
Cosmos World Foundation Model Platform for Physical AI
takmin
0
970
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
10k
SchooでVue.js/Nuxtを技術選定している理由
yamanoku
3
200
顧客の言葉を、そのまま信じない勇気
yamatai1212
1
360
プロダクト成長を支える開発基盤とスケールに伴う課題
yuu26
4
1.4k
Webhook best practices for rock solid and resilient deployments
glaforge
2
310
登壇駆動学習のすすめ — CfPのネタの見つけ方と書くときに意識していること
bicstone
3
130
こんなところでも(地味に)活躍するImage Modeさんを知ってるかい?- Image Mode for OpenShift -
tsukaman
1
170
AWS Network Firewall Proxyを触ってみた
nagisa53
1
240
2026年、サーバーレスの現在地 -「制約と戦う技術」から「当たり前の実行基盤」へ- /serverless2026
slsops
2
270
Featured
See All Featured
Skip the Path - Find Your Career Trail
mkilby
0
59
Have SEOs Ruined the Internet? - User Awareness of SEO in 2025
akashhashmi
0
270
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
1.9k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.6k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
1
57
Java REST API Framework Comparison - PWX 2021
mraible
34
9.1k
Thoughts on Productivity
jonyablonski
74
5k
First, design no harm
axbom
PRO
2
1.1k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
196
71k
Agile that works and the tools we love
rasmusluckow
331
21k
Stop Working from a Prison Cell
hatefulcrawdad
273
21k
Prompt Engineering for Job Search
mfonobong
0
160
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857