Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Sergey Arkhipov
November 11, 2017
Programming
260
0
Share
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
170
Concurrency Models
9seconds
0
250
Own Mustache
9seconds
0
350
Daemonize
9seconds
0
350
Stuff That Works
9seconds
0
380
Evidence
9seconds
0
110
Redneck Monads
9seconds
1
120
Latency
9seconds
0
150
Oh Blindfold Russia!
9seconds
0
320
Other Decks in Programming
See All in Programming
Mastering Event Sourcing: Your Parents Holidayed in Yugoslavia
super_marek
0
130
テレメトリーシグナルが導くパフォーマンス最適化 / Performance Optimization Driven by Telemetry Signals
seike460
PRO
2
200
AI時代のシステム設計:ドメインモデルで変更しやすさを守る設計戦略
masuda220
PRO
6
1.2k
AIコードレビューの導入・運用と AI駆動開発における「AI4QA」の取り組みについて
hagevvashi
0
580
一度始めたらやめられない開発効率向上術 / Findy あなたのdotfilesを教えて!
k0kubun
3
2.7k
AI Assistants for YourAngular Solutions @Angular Graz, March 2026
manfredsteyer
PRO
0
140
OTP を自動で入力する裏技
megabitsenmzq
0
130
コードレビューをしない選択 #でぃーぷらすトウキョウ
kajitack
3
1.2k
最初からAWS CDKで技術検証してもいいんじゃない?
akihisaikeda
4
180
Geminiをパートナーに神社DXシステムを個人開発した話(いなめぐDX 開発振り返り)
fujiba
0
130
Coding at the Speed of Thought: The New Era of Symfony Docker
dunglas
0
4.1k
Symfony + NelmioApiDocBundle を使った スキーマ駆動開発 / Schema Driven Development with NelmioApiDocBundle
okashoi
0
250
Featured
See All Featured
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
AI Search: Where Are We & What Can We Do About It?
aleyda
0
7.2k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
17k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.4k
Google's AI Overviews - The New Search
badams
0
950
Unsuck your backbone
ammeep
672
58k
Optimizing for Happiness
mojombo
378
71k
Design in an AI World
tapps
0
190
The Curious Case for Waylosing
cassininazir
0
280
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.5k
Test your architecture with Archunit
thirion
1
2.2k
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A