Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
63
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
320
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
510
Resiliency in Elasticsearch & Lucene
bleskes
0
230
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
710
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
670
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
機械学習を「社会実装」するということ 2025年夏版 / Social Implementation of Machine Learning July 2025 Version
moepy_stats
1
1.5k
The Madness of Multiple Gemini CLIs Developing Simultaneously with Jujutsu
gunta
1
2.8k
ユーザー理解の爆速化とPdMの価値
kakehashi
PRO
1
110
AWS表彰プログラムとキャリアについて
naoki_0531
1
150
東京海上日動におけるセキュアな開発プロセスの取り組み
miyabit
0
200
【CEDEC2025】LLMを活用したゲーム開発支援と、生成AIの利活用を進める組織的な取り組み
cygames
PRO
1
1.8k
ファインディにおける Dataform ブランチ戦略
hiracky16
0
220
2025-07-31: GitHub Copilot Agent mode at Vibe Coding Cafe (15min)
chomado
1
190
メモ整理が苦手な者による頑張らないObsidian活用術
optim
0
150
Power Automate のパフォーマンス改善レシピ / Power Automate Performance Improvement Recipes
karamem0
0
280
CSPヘッダー導入で実現するWebサイトの多層防御:今すぐ試せる設定例と運用知見
llamakko
1
270
20250728 MCP, A2A and Multi-Agents in the future
yoshidashingo
1
130
Featured
See All Featured
Being A Developer After 40
akosma
90
590k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
860
Reflections from 52 weeks, 52 projects
jeffersonlam
351
21k
What's in a price? How to price your products and services
michaelherold
246
12k
Gamification - CAS2011
davidbonilla
81
5.4k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Building an army of robots
kneath
306
45k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
Optimizing for Happiness
mojombo
379
70k
Practical Orchestrator
shlominoach
190
11k
Statistics for Hackers
jakevdp
799
220k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857