Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
63
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
320
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
510
Resiliency in Elasticsearch & Lucene
bleskes
0
230
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
710
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
680
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
ファッションコーディネートアプリ「WEAR」における、Vertex AI Vector Searchを利用したレコメンド機能の開発・運用で得られたノウハウの紹介
zozotech
PRO
0
600
Mackerel in さくらのクラウド
cubicdaiya
1
120
ロールが細分化された組織でSREと協働するインフラエンジニアは何をするか? / SRE Lounge #18
kossykinto
0
230
生成AI活用のROI、どう測る? DMM.com 開発責任者から学ぶ「AI効果検証のノウハウ」 / ROI of AI
i35_267
3
120
[OCI Technical Deep Dive] OCIで生成AIを活用するためのソリューション解説(2025年8月5日開催)
oracle4engineer
PRO
0
110
AIは変更差分からユニットテスト_結合テスト_システムテストでテストすべきことが出せるのか?
mineo_matsuya
3
1.4k
「AIと一緒にやる」が当たり前になるまでの奮闘記
kakehashi
PRO
3
170
MCP認可の現在地と自律型エージェント対応に向けた課題 / MCP Authorization Today and Challenges to Support Autonomous Agents
yokawasa
5
2.5k
アカデミーキャンプ 2025 SuuuuuuMMeR「燃えろ!!ロボコン」 / Academy Camp 2025 SuuuuuuMMeR "Burn the Spirit, Robocon!!" DAY 1
ks91
PRO
0
150
LLMで構造化出力の成功率をグンと上げる方法
keisuketakiguchi
0
990
UDDのススメ - 拡張版 -
maguroalternative
1
610
Claude Codeから我々が学ぶべきこと
oikon48
10
2.8k
Featured
See All Featured
Build your cross-platform service in a week with App Engine
jlugia
231
18k
Stop Working from a Prison Cell
hatefulcrawdad
271
21k
The Cost Of JavaScript in 2023
addyosmani
53
8.8k
Optimizing for Happiness
mojombo
379
70k
Visualization
eitanlees
146
16k
Agile that works and the tools we love
rasmusluckow
329
21k
A better future with KSS
kneath
239
17k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.4k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
31
2.2k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Adopting Sorbet at Scale
ufuk
77
9.5k
Why Our Code Smells
bkeepers
PRO
338
57k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857