Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of...
Search
Boaz Leskes
June 04, 2013
Technology
0
64
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
320
Life of a Document in Elasticsearch
bleskes
3
3.3k
Resiliency in Elasticsearch & Lucene
bleskes
0
510
Resiliency in Elasticsearch & Lucene
bleskes
0
230
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
720
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
370
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
680
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
310
Other Decks in Technology
See All in Technology
Building a cloud native business on open source
lizrice
0
140
旅で応援する✈️ NEWTが目指すコミュニティ支援とあたらしい旅行 / New Travel: Supporting by NEWT on Your Journey
mii3king
0
120
Biz職でもDifyでできる! 「触らないAIワークフロー」を実現する方法
igarashikana
3
1.3k
ローカルLLMとLINE Botの組み合わせ その2(EVO-X2でgpt-oss-120bを利用) / LINE DC Generative AI Meetup #7
you
PRO
0
140
Zephyr(RTOS)にEdge AIを組み込んでみた話
iotengineer22
1
240
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
12
81k
OCIjp_Oracle AI World_Recap
shinpy
1
150
Copilot Studio ハンズオン - 生成オーケストレーションモード
tomoyasasakimskk
0
190
WEBサービスを成り立たせるAWSサービス
takano0131
1
200
初めてのDatabricks Apps開発
taka_aki
1
230
それでも私が品質保証プロセスを作り続ける理由 #テストラジオ / Why I still continue to create QA process
pineapplecandy
0
150
混合雲環境整合異質工作流程工具運行關鍵業務 Job 的經驗分享
yaosiang
0
140
Featured
See All Featured
Gamification - CAS2011
davidbonilla
81
5.5k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
36
6.1k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Documentation Writing (for coders)
carmenintech
75
5.1k
Automating Front-end Workflow
addyosmani
1371
200k
Code Reviewing Like a Champion
maltzj
526
40k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.7k
Rebuilding a faster, lazier Slack
samanthasiow
84
9.2k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.5k
What's in a price? How to price your products and services
michaelherold
246
12k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857