My talk on rannts #18 (11.11.2017)
Вероятностныеструктуры данныхСергей Архипов, 2017
View Slide
curl http://site.com
curl -x myproxy.ru:3128 http://site.com
curl -x proxy.crawlera.com:8010http://site.com
evt evt evt evt evt
{"user": "sarkhipov","hostname": "rannts.ru","status": "ok","status_description": ""}
collectorcollectorcollector
{"user": "sarkhipov","hostname": "rannts.ru","status": "ok","status_description": ""}{"user": "sarkhipov","hostname": "rannts.ru","status": "ok","status_description": ""}{"user": "sarkhipov","hostname": "rannts.ru","status": "ok","status_description": ""}{"user": "sarkhipov","hostname": "rannts.ru","ok": 234,"banned": 12,"errors": 3,}
Consumer 1{"user": "sarkhipov","hostname": "rannts.ru","ok": 234,"banned": 12,"errors": 3,}Consumer 2{"user": "sarkhipov","hostname": "rannts.ru","ok": 250,"banned": 3,"errors": 0,}Consumer 3{"user": "sarkhipov","hostname": "rannts.ru","ok": 0,"banned": 124,"errors": 84,}
INSERT INTO stats (date,user,hostname,ok,ban,error)VALUES (:date,:user,:hostname,:ok,:ban,:error)ON DUPLICATE KEY UPDATEok = ok + VALUES(ok),ban = ban + VALUES(ban),error = error + VALUES(error);
{"user": "sarkhipov","hostname": "rannts.ru","status": "ok","status_description": "","response_time": 2861,}
(20 + 10) + 11 = (20 + 11) + 10
F(x)=P{σ{P(x⩽xα)⩾αP(x⩾xα)⩾1−α
Ω(N1p )
collectorcollectorcollectorpworkerpworkerpworker
var memCount = 75604275;var memPerSec = 1.38176367782;function updateCount() {next = -(1000 / memPerSec) *Math.log(Math.random());memCountString = ''+memCount;len = memCountString.length;memCountString = memCountString.substr(0, len- 6) + ’ < span style = ”font - size: 8 px” > </span>’+memCountString.substr(len-6,3)+‘ < spanstyle = ”font - size: 8 px” > </span>’+memCountString.substr(len-3,3);ge(‘memCount’).innerHTML = memCountString;memCount = memCount + 1;setTimeout(updateCount, next);}addEvent(window, ‘load’, updateCount);
35003671340035023463337136076012616862116017
3500 35073671 36673400 34103502 35023463 34663371 33303607 35996012 60096168 61526211 62156017 6016
Count-Min Sketch0 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 00 0 0 0 0
Count-Min Sketch0 0 1 0 00 0 0 1 00 0 0 1 00 1 0 0 00 0 0 0 1
Count-Min Sketch32 11 1 18 200126 184 78 1 091 59 30 24 882 76 34 48 7211 200 129 136 14
MinHashJ (A , B)=|A∩B||A∪B|k=[1ε2]
HyperLogLog010010000110010101101100011011000110111100100001b26 = 641001b= 9100001b= 33σ=1.04√2kE=α(k)4k∑j2−Mj
t-digest
t-digestX=x1, x2,…, xnX={s1,s2,…,sm}si={xl e f t(i),…, xr i ght(i)}
t-digestk(q,δ)≝δ(sin−1(2q−1)π +12)K(i)≝k(r i ght(i)n,δ)−k(le f t(i)−1n,δ)K (i)⩽1K(i)+K (i+1)>1
Q/A