Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
88
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
340
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
550
Resiliency in Elasticsearch & Lucene
bleskes
0
260
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
740
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
390
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
720
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.8k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
350
Other Decks in Technology
See All in Technology
作って終わりにしない タイミーのセマンティックレイヤー育成の現在地
chanyou0311
4
2.2k
失敗を経て、Harness Engineering で 大切にしたいことを考える / Learning from Failure: What Matters in Harness Engineering
bitkey
PRO
1
330
チームで進めるAI駆動アジャイル×ウォーターフォール
kumaiu
0
150
タクシーアプリ『GO』の実践的データ活用
mot_techtalk
3
190
フロンティアAIのゲート化と地政学リスク
nagatsu
0
130
2026 TECHFRESH 畢業分享會 - 開發日常大解密!從領域驅動到企業級上線
line_developers_tw
PRO
0
880
Building applications in the Gemini API family.
line_developers_tw
PRO
0
3.1k
Socrates × Looker 〜セマンティックレイヤーで進化するデータ分析エージェント〜
hanon52_
3
2.2k
LLMと共に進化するプロセスを目指して
ymatsuwitter
13
4.1k
やさしいA2A入門
minorun365
PRO
12
1.8k
Snowflakeと仲良くなる第一歩
coco_se
4
430
200個のGitHubリポジトリを横断調査したかった
icck
0
120
Featured
See All Featured
Unlocking the hidden potential of vector embeddings in international SEO
frankvandijk
0
840
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
10
1.2k
Impact Scores and Hybrid Strategies: The future of link building
tamaranovitovic
0
310
RailsConf 2023
tenderlove
30
1.5k
The untapped power of vector embeddings
frankvandijk
2
1.8k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
25
2k
It's Worth the Effort
3n
188
29k
Being A Developer After 40
akosma
91
590k
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.7k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
2
1.5k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857