Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Probablistic Data Structures
Search
Sergey Arkhipov
November 11, 2017
Programming
0
240
Probablistic Data Structures
My talk on rannts #18 (11.11.2017)
Sergey Arkhipov
November 11, 2017
Tweet
Share
More Decks by Sergey Arkhipov
See All by Sergey Arkhipov
Fingerprinting
9seconds
0
150
Concurrency Models
9seconds
0
210
Own Mustache
9seconds
0
320
Daemonize
9seconds
0
310
Stuff That Works
9seconds
0
320
Evidence
9seconds
0
87
Redneck Monads
9seconds
1
91
Latency
9seconds
0
120
Oh Blindfold Russia!
9seconds
0
290
Other Decks in Programming
See All in Programming
Goで作る、開発・CI環境
sin392
0
230
設計やレビューに悩んでいるPHPerに贈る、クリーンなオブジェクト設計の指針たち
panda_program
6
2k
AIともっと楽するE2Eテスト
myohei
2
760
XP, Testing and ninja testing
m_seki
3
240
ニーリーにおけるプロダクトエンジニア
nealle
0
800
GitHub Copilot and GitHub Codespaces Hands-on
ymd65536
2
150
AI時代のソフトウェア開発を考える(2025/07版) / Agentic Software Engineering Findy 2025-07 Edition
twada
PRO
78
26k
git worktree × Claude Code × MCP ~生成AI時代の並列開発フロー~
hisuzuya
1
550
Google Agent Development Kit でLINE Botを作ってみた
ymd65536
2
240
PHPでWebSocketサーバーを実装しよう2025
kubotak
0
280
LT 2025-06-30: プロダクトエンジニアの役割
yamamotok
0
730
“いい感じ“な定量評価を求めて - Four Keysとアウトカムの間の探求 -
nealle
1
9.3k
Featured
See All Featured
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
138
34k
Being A Developer After 40
akosma
90
590k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
10
950
For a Future-Friendly Web
brad_frost
179
9.8k
Embracing the Ebb and Flow
colly
86
4.7k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
22k
Code Review Best Practice
trishagee
69
18k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
130
19k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
Bash Introduction
62gerente
613
210k
What's in a price? How to price your products and services
michaelherold
246
12k
Transcript
Вероятностные структуры данных Сергей Архипов, 2017
None
None
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010 http://site.com
None
evt evt evt evt evt
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
None
collector collector collector
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":
12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
INSERT INTO stats ( date, user, hostname, ok, ban, error
) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
{ "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":
2861, }
(20 + 10) + 11 = (20 + 11) +
10
F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α
Ω(N 1 p )
collector collector collector pworker pworker pworker
None
None
var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()
{ next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
3500 3671 3400 3502 3463 3371 3607 6012 6168 6211
6017
3500 3507 3671 3667 3400 3410 3502 3502 3463 3466
3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Count-Min Sketch 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
Count-Min Sketch 32 11 1 18 200 126 184 78
1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2
]
HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9
100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
t-digest
t-digest
t-digest
t-digest X=x 1 , x 2 ,…, x n X={s
1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(
r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
t-digest
t-digest
collector collector collector pworker pworker pworker
None
Q/A