Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sergey Arkhipov
November 11, 2017
Programming
0
210
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
Tweet
Share
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
140
Concurrency Models
9seconds
0
150
Own Mustache
9seconds
0
270
Daemonize
9seconds
0
250
Stuff That Works
9seconds
0
250
Evidence
9seconds
0
71
Redneck Monads
9seconds
1
78
Latency
9seconds
0
89
Oh Blindfold Russia!
9seconds
0
230
Other Decks in Programming
See All in Programming
Rechartsで楽にゴリゴリにカスタマイズする!
10tera
1
160
エンジニア1年目で複雑なコードの改善に取り組んだ話
mtnmr
3
1.9k
Amazon Neptuneで始める初めてのグラフDB ー グラフDBを使う意味を考える ー
satoshi256kbyte
2
250
GraphQL あるいは React における自律的なデータ取得について
quramy
11
2.9k
Swiftコードバトル必勝法
toshi0383
0
150
意外とフォントが大事だった話 / Font Issues on Internationalization
fumi23
0
110
Rustではじめる負荷試験
skanehira
5
1.2k
KSPの導入・移行を前向きに検討しよう!
shxun6934
PRO
0
140
月間4.5億回再生を超える大規模サービス TVer iOSアプリのリアーキテクチャ戦略 - iOSDC2024
techtver
PRO
1
810
Scala におけるコンパイラエラーとの付き合い方
chencmd
2
410
Method Swizzlingを行うライブラリにおけるマルチモジュール設計
yoshikma
0
110
Architecture Decision Record (ADR)
nearme_tech
PRO
1
670
Featured
See All Featured
The Invisible Customer
myddelton
119
13k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
663
120k
Rails Girls Zürich Keynote
gr2m
93
13k
Designing for Performance
lara
604
68k
Code Reviewing Like a Champion
maltzj
518
39k
A Tale of Four Properties
chriscoyier
155
22k
Thoughts on Productivity
jonyablonski
66
4.2k
KATA
mclloyd
27
13k
Imperfection Machines: The Place of Print at Facebook
scottboms
263
13k
Principles of Awesome APIs and How to Build Them.
keavy
125
16k
Designing Experiences People Love
moore
138
23k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
38
9.2k
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A