Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Sergey Arkhipov
November 11, 2017
Programming
270
0
Share
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
180
Concurrency Models
9seconds
0
270
Own Mustache
9seconds
0
360
Daemonize
9seconds
0
360
Stuff That Works
9seconds
0
380
Evidence
9seconds
0
110
Redneck Monads
9seconds
1
130
Latency
9seconds
0
150
Oh Blindfold Russia!
9seconds
0
320
Other Decks in Programming
See All in Programming
PicoRuby for IoT: Connecting to the Cloud with MQTT
yuuu
2
770
My daily life on Ruby
a_matsuda
3
350
要はバランスからの卒業 #yumemi_grow
kajitack
0
160
Back to the roots of date
jinroq
0
820
【ディップ|26年新卒研修資料】TDD実装演習
dip_tech
PRO
0
180
KMP × Kotlin 2.3 - How Android Got Slower While iOS Builds Improved by 47%
rio432
0
180
Agentic Elixir
whatyouhide
0
450
UaaL×Androidアプリのメモリ計測 — Memory Profilerの先へ
rio432
0
160
【ディップ|26年新卒研修資料】OpenAPI/Swagger REST API研修
dip_tech
PRO
0
160
t *testing.T は どこからやってくるの?
otakakot
1
930
(Re)make Regexp in Ruby: Democratizing internals for the JIT
makenowjust
3
1.1k
Liberating Ruby's Parser from Lexer Hacks
ydah
2
2.7k
Featured
See All Featured
Information Architects: The Missing Link in Design Systems
soysaucechin
0
920
How to Align SEO within the Product Triangle To Get Buy-In & Support - #RIMC
aleyda
2
1.5k
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
65
55k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
250
Agile that works and the tools we love
rasmusluckow
331
21k
Effective software design: The role of men in debugging patriarchy in IT @ Voxxed Days AMS
baasie
0
350
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Raft: Consensus for Rubyists
vanstee
141
7.4k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.2k
How to make the Groovebox
asonas
2
2.2k
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
530
Agile Actions for Facilitating Distributed Teams - ADO2019
mkilby
0
190
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A