Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
63
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
320
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
510
Resiliency in Elasticsearch & Lucene
bleskes
0
230
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
710
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
670
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
Tech-Verse 2025 Global CTO Session
lycorptech_jp
PRO
0
440
急成長を支える基盤作り〜地道な改善からコツコツと〜 #cre_meetup
stefafafan
0
130
強化されたAmazon Location Serviceによる新機能と開発者体験
dayjournal
3
230
Amazon S3標準/ S3 Tables/S3 Express One Zoneを使ったログ分析
shigeruoda
4
550
TechLION vol.41~MySQLユーザ会のほうから来ました / techlion41_mysql
sakaik
0
190
Leveraging Open-Source Tools for Creating 3D Tiles in the Urban Environment
simboss
PRO
0
100
rubygem開発で鍛える設計力
joker1007
2
220
低レイヤを知りたいPHPerのためのCコンパイラ作成入門 完全版 / Building a C Compiler for PHPers Who Want to Dive into Low-Level Programming - Expanded
tomzoh
4
3.3k
AWS Summit Japan 2025 Community Stage - App workflow automation by AWS Step Functions
matsuihidetoshi
1
290
本が全く読めなかった過去の自分へ
genshun9
0
590
AWS Organizations 新機能!マルチパーティ承認の紹介
yhana
1
130
データプラットフォーム技術におけるメダリオンアーキテクチャという考え方/DataPlatformWithMedallionArchitecture
smdmts
5
650
Featured
See All Featured
A better future with KSS
kneath
239
17k
A Modern Web Designer's Workflow
chriscoyier
694
190k
Side Projects
sachag
455
42k
Into the Great Unknown - MozCon
thekraken
39
1.9k
GraphQLとの向き合い方2022年版
quramy
49
14k
Navigating Team Friction
lara
187
15k
Making the Leap to Tech Lead
cromwellryan
134
9.4k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
233
17k
BBQ
matthewcrist
89
9.7k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
229
22k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.2k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857