Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sergey Arkhipov
November 11, 2017
Programming
0
240
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
Tweet
Share
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
160
Concurrency Models
9seconds
0
210
Own Mustache
9seconds
0
320
Daemonize
9seconds
0
310
Stuff That Works
9seconds
0
330
Evidence
9seconds
0
90
Redneck Monads
9seconds
1
95
Latency
9seconds
0
120
Oh Blindfold Russia!
9seconds
0
290
Other Decks in Programming
See All in Programming
AIエージェント開発、DevOps and LLMOps
ymd65536
1
360
AI時代のドメイン駆動設計-DDD実践におけるAI活用のあり方 / ddd-in-ai-era
minodriven
24
9.2k
AIでLINEスタンプを作ってみた
eycjur
1
210
MLH State of the League: 2026 Season
theycallmeswift
0
170
STUNMESH-go: Wireguard NAT穿隧工具的源起與介紹
tjjh89017
0
390
LLMOpsのパフォーマンスを支える技術と現場で実践した改善
po3rin
8
990
A Gopher's Guide to Vibe Coding
danicat
0
180
CSC305 Summer Lecture 12
javiergs
PRO
0
130
ソフトウェアテスト徹底指南書の紹介
goyoki
1
120
デザインシステムが必須の時代に
yosuke_furukawa
PRO
2
130
オープンセミナー2025@広島LT技術ブログを続けるには
satoshi256kbyte
0
150
パッケージ設計の黒魔術/Kyoto.go#63
lufia
2
310
Featured
See All Featured
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
6k
RailsConf 2023
tenderlove
30
1.2k
A designer walks into a library…
pauljervisheath
207
24k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
358
30k
The Cost Of JavaScript in 2023
addyosmani
53
8.8k
Why Our Code Smells
bkeepers
PRO
339
57k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
9
790
Faster Mobile Websites
deanohume
309
31k
KATA
mclloyd
32
14k
Typedesign – Prime Four
hannesfritz
42
2.8k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.8k
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A