Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sergey Arkhipov
November 11, 2017
Programming
0
220
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
Tweet
Share
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
140
Concurrency Models
9seconds
0
180
Own Mustache
9seconds
0
300
Daemonize
9seconds
0
280
Stuff That Works
9seconds
0
290
Evidence
9seconds
0
79
Redneck Monads
9seconds
1
86
Latency
9seconds
0
99
Oh Blindfold Russia!
9seconds
0
260
Other Decks in Programming
See All in Programming
クックパッド検索システム統合/Cookpad Search System Consolidation
giga811
0
140
はじめての Go * WASM * OCR
sgash708
1
120
dbt Pythonモデルで実現するSnowflake活用術
trsnium
0
270
PRレビューのお供にDanger
stoticdev
1
240
iOSでQRコード生成奮闘記
ktcryomm
2
130
技術を改善し続ける
gumioji
0
180
Serverless Rust: Your Low-Risk Entry Point to Rust in Production (and the benefits are huge)
lmammino
1
160
Honoとフロントエンドの 型安全性について
yodaka
7
1.5k
SwiftUI移行のためのインプレッショントラッキング基盤の構築
kokihirokawa
0
170
DRFを少しずつ オニオンアーキテクチャに寄せていく DjangoCongress JP 2025
nealle
2
290
Ça bouge du côté des animations CSS !
goetter
2
160
データベースのオペレーターであるCloudNativePGがStatefulSetを使わない理由に迫る
nnaka2992
0
250
Featured
See All Featured
Raft: Consensus for Rubyists
vanstee
137
6.8k
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
GraphQLとの向き合い方2022年版
quramy
44
14k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
46
2.4k
Visualization
eitanlees
146
15k
How to Ace a Technical Interview
jacobian
276
23k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
How to Think Like a Performance Engineer
csswizardry
22
1.4k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
13
1k
RailsConf 2023
tenderlove
29
1k
Fireside Chat
paigeccino
35
3.2k
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A