Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Sergey Arkhipov
November 11, 2017
Programming
270
0
Share
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
170
Concurrency Models
9seconds
0
260
Own Mustache
9seconds
0
350
Daemonize
9seconds
0
350
Stuff That Works
9seconds
0
380
Evidence
9seconds
0
110
Redneck Monads
9seconds
1
130
Latency
9seconds
0
150
Oh Blindfold Russia!
9seconds
0
320
Other Decks in Programming
See All in Programming
Claude Codeをカスタムして自分だけのClaude Codeを作ろう
terisuke
0
140
第3木曜LT会 #28
tinykitten
PRO
0
110
ハーネスエンジニアリングとは?
kinopeee
11
5.6k
瑠璃の宝石に学ぶ技術の声の聴き方 / 【劇場版】アニメから得た学びを発表会2026 #エンジニアニメ
mazrean
0
260
Making the RBS Parser Faster
soutaro
0
420
属人化しないコード品質の作り方_2026.04.07.pdf
muraaano
0
200
ローカルで稼働するAI エージェントを超えて / beyond-local-ai-agents
gawa
3
280
Coding as Prompting Since 2025
ragingwind
0
840
Coding at the Speed of Thought: The New Era of Symfony Docker
dunglas
0
5k
クラウドネイティブなエンジニアに向ける Raycastの魅力と実際の活用事例
nealle
2
200
「話せることがない」を乗り越える 〜日常業務から登壇テーマをつくる思考法〜
shoheimitani
4
830
forteeの改修から振り返るPHPerKaigi 2026
muno92
PRO
3
290
Featured
See All Featured
A Guide to Academic Writing Using Generative AI - A Workshop
ks91
PRO
1
270
How to build a perfect <img>
jonoalderson
1
5.4k
A Modern Web Designer's Workflow
chriscoyier
698
190k
The Invisible Side of Design
smashingmag
303
52k
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
680
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.2k
RailsConf 2023
tenderlove
30
1.4k
Ecommerce SEO: The Keys for Success Now & Beyond - #SERPConf2024
aleyda
1
1.9k
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
210
No one is an island. Learnings from fostering a developers community.
thoeni
21
3.7k
Why Mistakes Are the Best Teachers: Turning Failure into a Pathway for Growth
auna
0
120
So, you think you're a good person
axbom
PRO
2
2k
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A