$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
68
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
330
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
520
Resiliency in Elasticsearch & Lucene
bleskes
0
240
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
720
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
690
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
320
Other Decks in Technology
See All in Technology
Karate+Database RiderによるAPI自動テスト導入工数をCline+GitLab MCPを使って2割削減を目指す! / 20251206 Kazuki Takahashi
shift_evolve
PRO
1
680
大企業でもできる!ボトムアップで拡大させるプラットフォームの作り方
findy_eventslides
1
700
Overture Maps Foundationの3年を振り返る
moritoru
0
170
AIと二人三脚で育てた、個人開発アプリグロース術
zozotech
PRO
1
710
多様なデジタルアイデンティティを攻撃からどうやって守るのか / 20251212
ayokura
0
420
Playwrightのソースコードに見る、自動テストを自動で書く技術
yusukeiwaki
13
5.2k
エンジニアリングをやめたくないので問い続ける
estie
2
1.1k
Ruby で作る大規模イベントネットワーク構築・運用支援システム TTDB
taketo1113
1
260
AWS re:Invent 2025で見たGrafana最新機能の紹介
hamadakoji
0
320
ブロックテーマとこれからの WordPress サイト制作 / Toyama WordPress Meetup Vol.81
torounit
0
550
計算機科学をRubyと歩む 〜DFA型正規表現エンジンをつくる~
ydah
3
230
AI駆動開発における設計思想 認知負荷を下げるフロントエンドアーキテクチャ/ 20251211 Teppei Hanai
shift_evolve
PRO
2
340
Featured
See All Featured
Typedesign – Prime Four
hannesfritz
42
2.9k
Mobile First: as difficult as doing things right
swwweet
225
10k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
1.6k
Into the Great Unknown - MozCon
thekraken
40
2.2k
What's in a price? How to price your products and services
michaelherold
246
12k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.8k
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.5k
Optimizing for Happiness
mojombo
379
70k
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.2k
Done Done
chrislema
186
16k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857