Probablistic Data Structures

Probablistic Data Structures

My talk on rannts #18 (11.11.2017)

A8d8ca813a744866b9f85ea1cefb5813?s=128

Sergey Arkhipov

November 11, 2017
Tweet

Transcript

  1. 2.
  2. 3.
  3. 7.
  4. 11.
  5. 13.

    { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" }

    { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "status": "ok", "status_description": "" } { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned": 12, "errors": 3, }
  6. 14.

    Consumer 1 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 234, "banned":

    12, "errors": 3, } Consumer 2 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 250, "banned": 3, "errors": 0, } Consumer 3 { "user": "sarkhipov", "hostname": "rannts.ru", "ok": 0, "banned": 124, "errors": 84, }
  7. 15.

    INSERT INTO stats ( date, user, hostname, ok, ban, error

    ) VALUES ( :date, :user, :hostname, :ok, :ban, :error ) ON DUPLICATE KEY UPDATE ok = ok + VALUES(ok), ban = ban + VALUES(ban), error = error + VALUES(error);
  8. 21.
  9. 22.
  10. 23.

    var memCount = 75604275; var memPerSec = 1.38176367782; function updateCount()

    { next = -(1000 / memPerSec) * Math.log(Math.random()); memCountString = ''+memCount; len = memCountString.length; memCountString = memCountString.substr(0, len - 6) + ’ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-6,3)+‘ < span style = ”font - size: 8 px” > < /span>’+memCountString.substr(len-3,3); ge(‘memCount’).innerHTML = memCountString; memCount = memCount + 1; setTimeout(updateCount, next); } addEvent(window, ‘load’, updateCount);
  11. 25.

    3500 3507 3671 3667 3400 3410 3502 3502 3463 3466

    3371 3330 3607 3599 6012 6009 6168 6152 6211 6215 6017 6016
  12. 26.

    Count-Min Sketch 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  13. 27.

    Count-Min Sketch 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  14. 28.

    Count-Min Sketch 0 0 1 0 0 0 0 0

    1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1
  15. 29.

    Count-Min Sketch 32 11 1 18 200 126 184 78

    1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
  16. 30.

    Count-Min Sketch 32 11 1 18 200 126 184 78

    1 0 91 59 30 24 8 82 76 34 48 72 11 200 129 136 14
  17. 32.

    HyperLogLog 010010000110010101101100011011000110111100100001 b 26 = 64 1001 b = 9

    100001 b = 33 σ= 1.04 √2k E= α(k)4k ∑ j 2−M j
  18. 33.
  19. 34.
  20. 35.
  21. 36.

    t-digest X=x 1 , x 2 ,…, x n X={s

    1 ,s 2 ,…,s m } s i ={x l e f t(i) ,…, x r i ght(i) }
  22. 37.

    t-digest k(q,δ)≝δ (sin−1 (2q−1) π + 1 2 ) K(i)≝k(

    r i ght(i) n ,δ)−k( le f t(i)−1 n ,δ) K (i)⩽1 K(i)+K (i+1)>1
  23. 38.
  24. 39.
  25. 41.
  26. 42.

    Q/A