Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Faceting analyzed fields with some sprinkles of probability theory
Search
Boaz Leskes
June 04, 2013
Technology
0
37
Faceting analyzed fields with some sprinkles of probability theory
Talk given at Berlin buzzwords 2013
Boaz Leskes
June 04, 2013
Tweet
Share
More Decks by Boaz Leskes
See All by Boaz Leskes
Every Shard Deserves a Home - Shard Allocation in Elasticsearch
bleskes
0
300
Life of a Document in Elasticsearch
bleskes
3
3.1k
Resiliency in Elasticsearch & Lucene
bleskes
0
500
Resiliency in Elasticsearch & Lucene
bleskes
0
210
Designing Concurrent Distributed Sequence Numbers for Elasticsearch
bleskes
2
680
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
1
360
Not all Nodes are Created Equal - Scaling Elasticsearch
bleskes
6
630
The ELK Stack: For Real-Time Enlightenment
bleskes
1
1.7k
Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch
bleskes
1
300
Other Decks in Technology
See All in Technology
The depthes of profiling Ruby - RubyKaigi 2024
osyoyu
0
130
Secrets of a PowerShell "Guru"
guyrleech
1
120
Oracle Cloud Infrastructureデータベース・クラウド:各バージョンのサポート期間
oracle4engineer
PRO
12
7.9k
スクラムに出会って「できた」を実感できるようになってきた話 / Scrum makes me feel like I can do it
yayoi_dd
2
110
株式会社EventHub・エンジニア採用資料
eventhub
0
2.1k
Cloudflare WorkersがPythonに対応したので試してみた
miura55
0
190
Step by Stepで学ぶ、ADT(代数的データ型)、モナドからEffect-TSまで
leveragestech
1
2.9k
生成AIと産業向けソフトウェアの自動生成 〜 ハノーバーメッセ2024より〜
kioto
2
420
エンジニアゼロの組織から内製開発の DX をどう実現したのか / How did we achieve DX in in-house development in an organization with zero engineers?
genkiogasawara
6
2.9k
【リラン】AIの光と闇?失敗しないために知っておきたいAIリスクとその対応 ①政府の動き編
tkhresk
0
130
Deno で作る快適な “as Code” プラットフォーム – TSKaigi 2024
pizzacat83
4
310
動画配信サービスのフロントエンド実装に学ぶ設計原則
yud0uhu
0
130
Featured
See All Featured
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
34
8.9k
Docker and Python
trallard
35
2.7k
Product Roadmaps are Hard
iamctodd
45
9.8k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
14
1.5k
The Art of Programming - Codeland 2020
erikaheidi
43
12k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
188
16k
Testing 201, or: Great Expectations
jmmastey
30
6.4k
[RailsConf 2023] Rails as a piece of cake
palkan
28
4k
Principles of Awesome APIs and How to Build Them.
keavy
121
16k
Optimizing for Happiness
mojombo
370
69k
Optimising Largest Contentful Paint
csswizardry
13
2.4k
The Pragmatic Product Professional
lauravandoore
26
5.9k
Transcript
Faceting analyzed fields with some sprinkles of probability theory conjures
trending topic analysis and other interesting insights Boaz Leskes Elasticsearch @bleskes work done for Buzzcapture
Trending?
© Buzzcapture
© Buzzcapture
reference reference topic © Buzzcapture
topic reference ≠
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
None
topic reference P(w|T) = kDt |w 2 Dt k kDt
k
P(w|T) = kDt |w 2 Dt k kDt k
brown dog fox quick 2 5 10 12 5 6
12 13 2 5 6 10 12 13 brown dog fox quick
In our index. • Terms = 12GB • “Arrows” =
41GB
{ tweet: { type: "string", analyzer: "whitespace" fielddata: { filter:
{ regex: "^#.*", frequency: { min: 10 } } } } } Drop terms which occur too little
Drop docs with too many terms
reference reference topic © Buzzcapture
iculture 10,122 floor 8,998 cover 6,874 toy 4,402 ground 3,841
4.0 7,878 4.1 4,292 rtacties 4,078 jelly 2,905 bean 2,857