Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Probablistic Data Structures

Probablistic Data Structures

My talk on rannts #18 (11.11.2017)

Sergey Arkhipov

November 11, 2017
Tweet

More Decks by Sergey Arkhipov

Other Decks in Programming

Transcript

  1. Вероятностные
    структуры данных
    Сергей Архипов, 2017

    View Slide

  2. View Slide

  3. View Slide

  4. curl http://site.com

    View Slide

  5. curl -x myproxy.ru:3128 http://site.com

    View Slide

  6. curl -x proxy.crawlera.com:8010
    http://site.com

    View Slide

  7. View Slide

  8. evt evt evt evt evt

    View Slide

  9. {
    "user": "sarkhipov",
    "hostname": "rannts.ru",
    "status": "ok",
    "status_description": ""
    }

    View Slide

  10. {
    "user": "sarkhipov",
    "hostname": "rannts.ru",
    "status": "ok",
    "status_description": ""
    }

    View Slide

  11. View Slide

  12. collector
    collector
    collector

    View Slide

  13. {
    "user": "sarkhipov",
    "hostname": "rannts.ru",
    "status": "ok",
    "status_description": ""
    }
    {
    "user": "sarkhipov",
    "hostname": "rannts.ru",
    "status": "ok",
    "status_description": ""
    }
    {
    "user": "sarkhipov",
    "hostname": "rannts.ru",
    "status": "ok",
    "status_description": ""
    }
    {
    "user": "sarkhipov",
    "hostname": "rannts.ru",
    "ok": 234,
    "banned": 12,
    "errors": 3,
    }

    View Slide

  14. Consumer 1
    {
    "user": "sarkhipov",
    "hostname": "rannts.ru",
    "ok": 234,
    "banned": 12,
    "errors": 3,
    }
    Consumer 2
    {
    "user": "sarkhipov",
    "hostname": "rannts.ru",
    "ok": 250,
    "banned": 3,
    "errors": 0,
    }
    Consumer 3
    {
    "user": "sarkhipov",
    "hostname": "rannts.ru",
    "ok": 0,
    "banned": 124,
    "errors": 84,
    }

    View Slide

  15. INSERT INTO stats (
    date,
    user,
    hostname,
    ok,
    ban,
    error
    )
    VALUES (
    :date,
    :user,
    :hostname,
    :ok,
    :ban,
    :error
    )
    ON DUPLICATE KEY UPDATE
    ok = ok + VALUES(ok),
    ban = ban + VALUES(ban),
    error = error + VALUES(error);

    View Slide

  16. {
    "user": "sarkhipov",
    "hostname": "rannts.ru",
    "status": "ok",
    "status_description": "",
    "response_time": 2861,
    }

    View Slide

  17. (20 + 10) + 11 = (20 + 11) + 10

    View Slide

  18. F(x)=P{σ{
    P(x⩽x
    α
    )⩾α
    P(x⩾x
    α
    )⩾1−α

    View Slide

  19. Ω(N
    1
    p )

    View Slide

  20. collector
    collector
    collector
    pworker
    pworker
    pworker

    View Slide

  21. View Slide

  22. View Slide

  23. var memCount = 75604275;
    var memPerSec = 1.38176367782;
    function updateCount() {
    next = -(1000 / memPerSec) *
    Math.log(Math.random());
    memCountString = ''+memCount;
    len = memCountString.length;
    memCountString = memCountString.substr(0, len
    - 6) + ’ < span style = ”font - size: 8 px” > <
    /span>’+memCountString.substr(len-6,3)+‘ < span
    style = ”font - size: 8 px” > <
    /span>’+memCountString.substr(len-3,3);
    ge(‘memCount’).innerHTML = memCountString;
    memCount = memCount + 1;
    setTimeout(updateCount, next);
    }
    addEvent(window, ‘load’, updateCount);

    View Slide

  24. 3500
    3671
    3400
    3502
    3463
    3371
    3607
    6012
    6168
    6211
    6017

    View Slide

  25. 3500 3507
    3671 3667
    3400 3410
    3502 3502
    3463 3466
    3371 3330
    3607 3599
    6012 6009
    6168 6152
    6211 6215
    6017 6016

    View Slide

  26. Count-Min Sketch
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0

    View Slide

  27. Count-Min Sketch
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0
    0 0 0 0 0

    View Slide

  28. Count-Min Sketch
    0 0 1 0 0
    0 0 0 1 0
    0 0 0 1 0
    0 1 0 0 0
    0 0 0 0 1

    View Slide

  29. Count-Min Sketch
    32 11 1 18 200
    126 184 78 1 0
    91 59 30 24 8
    82 76 34 48 72
    11 200 129 136 14

    View Slide

  30. Count-Min Sketch
    32 11 1 18 200
    126 184 78 1 0
    91 59 30 24 8
    82 76 34 48 72
    11 200 129 136 14

    View Slide

  31. MinHash
    J (A , B)=
    |A∩B|
    |A∪B|
    k=[
    1
    ε2
    ]

    View Slide

  32. HyperLogLog
    010010000110010101101100011011000110111100100001
    b
    26 = 64
    1001
    b
    = 9
    100001
    b
    = 33
    σ=
    1.04
    √2k
    E=
    α(k)4k

    j
    2−M
    j

    View Slide

  33. t-digest

    View Slide

  34. t-digest

    View Slide

  35. t-digest

    View Slide

  36. t-digest
    X=x
    1
    , x
    2
    ,…, x
    n
    X={s
    1
    ,s
    2
    ,…,s
    m
    }
    s
    i
    ={x
    l e f t(i)
    ,…, x
    r i ght(i)
    }

    View Slide

  37. t-digest
    k(q,δ)≝δ
    (sin−1
    (2q−1)
    π +
    1
    2
    )
    K(i)≝k(
    r i ght(i)
    n
    ,δ)−k(
    le f t(i)−1
    n
    ,δ)
    K (i)⩽1
    K(i)+K (i+1)>1

    View Slide

  38. t-digest

    View Slide

  39. t-digest

    View Slide

  40. collector
    collector
    collector
    pworker
    pworker
    pworker

    View Slide

  41. View Slide

  42. Q/A

    View Slide