Pro Yearly is on sale from $80 to $50! »

Probablistic Data Structures

Probablistic Data Structures

My talk on rannts #18 (11.11.2017)

A8d8ca813a744866b9f85ea1cefb5813?s=128

Sergey Arkhipov

November 11, 2017
Tweet

Transcript

  1. Вероятностные структуры данных Сергей Архипов, 2017

  2. None
  3. None
  4. curl http://site.com

  5. curl -x myproxy.ru:3128 http://site.com

  6. curl -x proxy.crawlera.com:8010 http://site.com

  7. None
  8. evt evt evt evt evt

  9. { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }

  10. { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }

  11. None
  12. collector collector collector

  13. { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }

    { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
  14. Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":

    12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
  15. INSERT INTO stats ( date, user, hostname, ok, ban, error

    ) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
  16. { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "", "response_time":

    2861, }
  17. (20 + 10) + 11 = (20 + 11) +

    10
  18. F(x)=P{σ<x} { P(x⩽x α )⩾α P(x⩾x α )⩾1−α

  19. Ω(N 1 p )

  20. collector collector collector pworker pworker pworker

  21. None
  22. None
  23. var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()

    { next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
  24. 3500 3671 3400 3502 3463 3371 3607 6012 6168 6211

    6017
  25. 3500 3507 3671 3667 3400 3410 3502 3502 3463 3466

    3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
  26. Count-Min Sketch 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  27. Count-Min Sketch 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  28. Count-Min Sketch 0 0 1 0 0 0 0 0

    1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
  29. Count-Min Sketch 32 11 1 18 200 126 184 78

    1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
  30. Count-Min Sketch 32 11 1 18 200 126 184 78

    1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
  31. MinHash J (A , B)= |A∩B| |A∪B| k=[ 1 ε2

    ]
  32. HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9

    100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
  33. t-digest

  34. t-digest

  35. t-digest

  36. t-digest X=x 1 , x 2 ,…, x n X={s

    1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
  37. t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(

    r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
  38. t-digest

  39. t-digest

  40. collector collector collector pworker pworker pworker

  41. None
  42. Q/A