Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
49
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
310
Life of a Document in Elasticsearch
bleskes
3
3.1k
Resiliency in Elasticsearch & Lucene
bleskes
0
500
Resiliency in Elasticsearch & Lucene
bleskes
0
220
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
690
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
360
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
650
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
300
Other Decks in Technology
See All in Technology
最速最小からはじめるデータプロダクト / Data Product MVP
amaotone
5
740
10分でわかるfreee エンジニア向け会社説明資料
freee
18
520k
CAMERA-Suite: 広告文生成のための評価スイート / ai-camera-suite
cyberagentdevelopers
PRO
3
270
ネット広告に未来はあるか?「3rd Party Cookie廃止とPrivacy Sandboxの効果検証の裏側」 / third-party-cookie-privacy
cyberagentdevelopers
PRO
1
130
Jr. Championsになって、強く連携しながらAWSをもっと使いたい!~AWSに対する期待と行動~
amixedcolor
0
190
AWSコンテナ本出版から3年経った今、もし改めて執筆し直すなら / If I revise our container book
iselegant
15
4k
バクラクにおける可観測性向上の取り組み
yuu26
3
420
AWS CDKでデータリストアの運用、どのように設計する?~Aurora・EFSの実践事例を紹介~/aws-cdk-data-restore-aurora-efs
mhrtech
4
660
分布で見る効果検証入門 / ai-distributional-effect
cyberagentdevelopers
PRO
4
700
Oracle Cloud Infrastructureデータベース・クラウド:各バージョンのサポート期間
oracle4engineer
PRO
27
12k
「視座」の上げ方が成人発達理論にわかりやすくまとまってた / think_ perspective_hidden_dimensions
shuzon
2
5.1k
いまならこう作りたい AWSコンテナ[本格]入門ハンズオン 〜2024年版 ハンズオンの構想〜
horsewin
9
2.1k
Featured
See All Featured
GitHub's CSS Performance
jonrohan
1030
460k
How to Think Like a Performance Engineer
csswizardry
19
1.1k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
It's Worth the Effort
3n
183
27k
The Cost Of JavaScript in 2023
addyosmani
45
6.6k
Docker and Python
trallard
40
3.1k
Testing 201, or: Great Expectations
jmmastey
38
7k
The Power of CSS Pseudo Elements
geoffreycrofte
72
5.3k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
6.9k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
126
18k
Become a Pro
speakerdeck
PRO
24
5k
Rails Girls Zürich Keynote
gr2m
93
13k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857