Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sergey Arkhipov
November 11, 2017
Programming
0
240
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
Tweet
Share
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
160
Concurrency Models
9seconds
0
210
Own Mustache
9seconds
0
320
Daemonize
9seconds
0
320
Stuff That Works
9seconds
0
330
Evidence
9seconds
0
91
Redneck Monads
9seconds
1
96
Latency
9seconds
0
120
Oh Blindfold Russia!
9seconds
0
290
Other Decks in Programming
See All in Programming
Cache Me If You Can
ryunen344
2
4k
スケールする組織の実現に向けた インナーソース育成術 - ISGT2025
teamlab
PRO
2
170
複雑なフォームに立ち向かう Next.js の技術選定
macchiitaka
2
240
アルテニア コンサル/ITエンジニア向け 採用ピッチ資料
altenir
0
110
アプリの "かわいい" を支えるアニメーションツールRiveについて
uetyo
0
280
Flutter with Dart MCP: All You Need - 박제창 2025 I/O Extended Busan
itsmedreamwalker
0
150
私の後悔をAWS DMSで解決した話
hiramax
4
210
もうちょっといいRubyプロファイラを作りたい (2025)
osyoyu
1
460
Navigation 2 を 3 に移行する(予定)ためにやったこと
yokomii
0
350
OSS開発者という働き方
andpad
5
1.7k
JSONataを使ってみよう Step Functionsが楽しくなる実践テクニック #devio2025
dafujii
1
640
1から理解するWeb Push
dora1998
7
1.9k
Featured
See All Featured
What's in a price? How to price your products and services
michaelherold
246
12k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.4k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3k
Making the Leap to Tech Lead
cromwellryan
135
9.5k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
18
1.1k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.7k
Code Reviewing Like a Champion
maltzj
525
40k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
530
BBQ
matthewcrist
89
9.8k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A