Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Structure Adventures for Fun, Profit, and Performance

Data Structure Adventures for Fun, Profit, and Performance

Erlang comes with numerous data structures ranging from built-in lists, tuples, records, and (recently) maps to standard lib offerings such as orddict, dict, gbtrees, sets, orddsets, queue, and array. Likewise, use cases that need a concurrent data structure can use ETS in various ways as a concurrent hash map, concurrent ordered tree, or even a set of concurrent counters.

While these built-in offerings are often adequate, anyone who has tried to build high performance Erlang system often runs into performance challenges with these existing offerings. Often times, these systems move to using external systems or custom native code (via NIFs) to meet the performance demands.

This talk presents on-going work to build a new set of high performance data structures for Erlang, including both single process data structures as well as various concurrent data structures.

The primary goal of this work is to provide data structures that have good performance, are memory-efficient, and play well with the Erlang scheduler and therefore enable more code to be written in Erlang rather than resorting to native code or external systems/libraries.

Joseph Blomstedt

March 27, 2015
Tweet

More Decks by Joseph Blomstedt

Other Decks in Technology

Transcript

  1. Erlang Factory 2015
    Joe Blomstedt
    Basho Technologies
    Data Structure Adventures
    Monday, March 30, 15

    View Slide

  2. 2
    Erlang is highly
    productive language
    Monday, March 30, 15

    View Slide

  3. 3
    Scalable
    Monday, March 30, 15

    View Slide

  4. 4
    Distributed
    Monday, March 30, 15

    View Slide

  5. 5
    Fault Tolerant
    Monday, March 30, 15

    View Slide

  6. 6
    Performance envy
    Monday, March 30, 15

    View Slide

  7. 7
    Solution
    Write a NIF!
    Monday, March 30, 15

    View Slide

  8. 8
    Monday, March 30, 15

    View Slide

  9. 9
    Data structures
    Dispatch/protection
    Statistics
    Monday, March 30, 15

    View Slide

  10. Data Structures
    10
    Monday, March 30, 15

    View Slide

  11. 11
    orddict
    dict
    gb_trees
    Monday, March 30, 15

    View Slide

  12. 12
    1000 10k 100k 1mm 10mm
    orddict 25 2770 -- -- --
    dict 2 21 315 16485 --
    gb_trees 3 44 577 8095 --
    Monday, March 30, 15

    View Slide

  13. 13
    1000 10k 100k 1mm 10mm
    bt 4 59 708 8894 --
    dict 2 21 315 16485 --
    gb_trees 3 44 577 8095 --
    Monday, March 30, 15

    View Slide

  14. 14
    Immutable
    Shared structure
    Monday, March 30, 15

    View Slide

  15. 15
    D1 = dict:new(),
    D2 = dict:store(1, 10, D1),
    D3 = dict:store(1, 15, D2),
    10 = dict:fetch(1, D2),
    15 = dict:fetch(1, D3).
    Monday, March 30, 15

    View Slide

  16. 16
    1000 10k 100k 1mm 10mm
    ets 8 8 49 497 5296
    dict 2 21 315 16485 --
    gb_trees 3 44 577 8095 --
    Monday, March 30, 15

    View Slide

  17. 17
    Concurrent
    Fast
    Off Heap
    Monday, March 30, 15

    View Slide

  18. 18
    Immutable
    Shared Structure
    Concurrent
    Fast
    Off Heap
    Monday, March 30, 15

    View Slide

  19. CoW B-Tree
    19
    Monday, March 30, 15

    View Slide

  20. 20
    (1,10),(5,50),(9,90)
    root
    Monday, March 30, 15

    View Slide

  21. 21
    (1,10),(2,20)
    (1,A),(5,B)
    (5,50),(9,90)
    root
    A
    B
    Monday, March 30, 15

    View Slide

  22. 22
    (1,10),(2,20),(3,30)
    (1,A),(5,B)
    (5,50),(9,90)
    root
    A
    B
    Monday, March 30, 15

    View Slide

  23. 23
    (1,10),(2,20),(3,30)
    (1,A),(5,B)
    (5,50),(6,60),(9,90)
    root
    A
    B
    Monday, March 30, 15

    View Slide

  24. 24
    (1,10),(2,20)
    (1,A),(3,C),(5,B)
    (3,30),(4,40)
    (5,50),(6,60),(9,90)
    root
    A
    C
    B
    Monday, March 30, 15

    View Slide

  25. 25
    (1,10),(2,20)
    (1,A),(3,C)
    (3,30),(4,40)
    (5,50),(6,60)
    (7,70),(9,90)
    (1,E),(5,F)
    (5,B),(7,D)
    root
    E
    F
    A
    C
    B
    D
    Monday, March 30, 15

    View Slide

  26. 26
    [{1,
    [{1,[{1,10},{2,20}]},
    {3,[{3,30},{4,40}]}]},
    {5,
    [{5,[{5,50},{6,60}]},
    {7,[{7,70},{9,90}]}]}]
    Monday, March 30, 15

    View Slide

  27. 27
    find(Key, #tree{height=Height, root=Root}) ->
    find(1, Height, Key, Root).
    find(Depth, Height, Key, Node)
    when Depth == Height ->
    %% leaf node
    orddict:find(Key, Node);
    find(Depth, Height, Key, Node) ->
    %% inner node
    {_, Child} = search(Node, Key),
    find(Depth + 1, Height, Key, Child).
    Monday, March 30, 15

    View Slide

  28. 28
    1000 10k 100k 1mm 10mm
    bt 4 59 708 8894 --
    dict 2 21 315 16485 --
    gb_trees 3 44 577 8095 --
    Monday, March 30, 15

    View Slide

  29. 29
    Immutable
    Shared Structure
    Concurrent
    Fast
    Off Heap
    Monday, March 30, 15

    View Slide

  30. 30
    Rewrite as a NIF
    Monday, March 30, 15

    View Slide

  31. 31
    Allocation
    Snapshots
    Reclamation (SMR)
    Atomics / Ordering
    Monday, March 30, 15

    View Slide

  32. 32
    Epoch Reclamation
    Monday, March 30, 15

    View Slide

  33. 33
    Grace Period Detection
    Monday, March 30, 15

    View Slide

  34. 34
    read
    T0
    T1
    T2
    logical delete synchronize delete
    read
    Monday, March 30, 15

    View Slide

  35. 35
    0
    T0
    T1
    T2
    logical delete synchronize delete
    1 2 3
    0 1 2 3
    1 1 2 3
    Monday, March 30, 15

    View Slide

  36. 36
    std::atomic root;
    void reader() {
    while(true) {
    epoch_begin();
    Root *r = root;
    do_something(r);
    epoch_end();
    }
    }
    void writer() {
    while(true) {
    Root *old_root = root;
    Root *new_root = update(root);
    root = new_root;
    epoch_synchronize();
    delete root;
    }
    }
    Monday, March 30, 15

    View Slide

  37. 37
    std::atomic root;
    uint64_t epoch;
    void reader() {
    while(true) {
    epoch_begin();
    Root *r = root;
    snapshot(r.epoch);
    epoch_end();
    do_something(r);
    }
    }
    void writer() {
    while(true) {
    epoch++;
    Root *old_root = root;
    Root *new_root = update(root);
    new_root->birth = epoch;
    old_root->death = epoch;
    root = new_root;
    garbage.push_back(old_root);
    collect();
    }
    }
    Monday, March 30, 15

    View Slide

  38. 38
    void collect() {
    epoch_synchronize();
    for(auto *item : garbage) {
    bool live = false;
    for(auto epoch : snapshots) {
    if((epoch >= item->birth) &&
    (epoch < item->death)) {
    live = true;
    break;
    }
    }
    if(live)
    keep.push_back(item);
    else
    delete item;
    }
    garbage.swap(keep);
    }
    Monday, March 30, 15

    View Slide

  39. 39
    Flat Combining
    Monday, March 30, 15

    View Slide

  40. 40
    (request)
    ready?
    (request)
    ready?
    (request)
    ready?
    locked?
    requests[]
    Monday, March 30, 15

    View Slide

  41. 41
    (request)
    ready?
    (request)
    ready?
    (request)
    ready?
    locked?
    requests[]
    Monday, March 30, 15

    View Slide

  42. 42
    (request)
    ready?
    (request)
    ready?
    (request)
    ready?
    locked?
    requests[]
    Monday, March 30, 15

    View Slide

  43. 43
    (request)
    ready?
    (request)
    ready?
    (request)
    ready?
    locked?
    requests[]
    Monday, March 30, 15

    View Slide

  44. 44
    (request)
    ready?
    (request)
    ready?
    (request)
    ready?
    locked?
    requests[]
    Monday, March 30, 15

    View Slide

  45. 45
    (request)
    ready?
    (request)
    ready?
    (request)
    ready?
    locked?
    requests[]
    Monday, March 30, 15

    View Slide

  46. 46
    (request)
    ready?
    (request)
    ready?
    (request)
    ready?
    locked?
    requests[]
    Monday, March 30, 15

    View Slide

  47. 47
    B1 = btn:new(),
    B2 = btn:store(1, 10, B1),
    B3 = btn:store(1, 15, B2),
    10 = btn:fetch(1, B1),
    15 = btn:fetch(1, B2).
    Monday, March 30, 15

    View Slide

  48. 48
    1000 10k 100k 1mm 10mm
    bt 4 59 708 8894 --
    ets 8 8 49 497 5296
    bt_nif 11 23 267 2393 20908
    Monday, March 30, 15

    View Slide

  49. 49
    btn:m_new(test),
    btn:m_store(test, 1, 10).
    btn:m_store(test, 1, 15),
    spawn(fun() ->
    15 = btn:m_fetch(1)
    end).
    Monday, March 30, 15

    View Slide

  50. 50
    1000 10k 100k 1mm 10mm
    bt_nif2 3 6 68 744 8513
    ets 8 8 49 497 5296
    bt_nif 11 23 267 2393 20908
    Monday, March 30, 15

    View Slide

  51. Worker Dispatch
    51
    Monday, March 30, 15

    View Slide

  52. 52
    Monday, March 30, 15

    View Slide

  53. 53
    Load balancing
    Overload protection
    Monday, March 30, 15

    View Slide

  54. 54
    sidejob
    Monday, March 30, 15

    View Slide

  55. 55
    1 5 3 0 8 8
    message counters (ETS)
    Monday, March 30, 15

    View Slide

  56. 56
    1 5 3 0 8 8
    message counters (ETS)
    Monday, March 30, 15

    View Slide

  57. 57
    1 5 3 0 8 8
    message counters (ETS)
    Monday, March 30, 15

    View Slide

  58. 58
    1 6 3 0 8 8
    message counters (ETS)
    Monday, March 30, 15

    View Slide

  59. 59
    1 6 3 0 8 8
    message counters (ETS)
    Monday, March 30, 15

    View Slide

  60. 60
    1 6 3 0 8 8
    message counters (ETS)
    Monday, March 30, 15

    View Slide

  61. 61
    1 6 3 0 8 8
    message counters (ETS)
    Monday, March 30, 15

    View Slide

  62. 62
    1 6 3 0 8 8
    message counters (ETS)
    Monday, March 30, 15

    View Slide

  63. 63
    2 6 3 0 8 8
    message counters (ETS)
    Monday, March 30, 15

    View Slide

  64. 64
    2 6 3 0 8 8
    message counters (ETS)
    Monday, March 30, 15

    View Slide

  65. 65
    dispatch NIF
    Monday, March 30, 15

    View Slide

  66. dispatch:new(test).
    sender() ->
    Pid = dispatch:find(test),
    Pid ! Msg,
    ok.
    worker() ->
    Name = dispatch:listen(test, self()),
    worker(Name).
    worker(Name) ->
    receive Msg ->
    do_something(Msg),
    dispatch:ack(test, Name),
    worker(Name)
    end.
    Monday, March 30, 15

    View Slide

  67. 67
    Faster under contention
    (About 1.5x)
    Monday, March 30, 15

    View Slide

  68. Statistics
    68
    Monday, March 30, 15

    View Slide

  69. 69
    folsom
    exometer
    (others?)
    Monday, March 30, 15

    View Slide

  70. 70
    Challenge
    min/max
    mean
    latency percentiles
    metrics++
    Monday, March 30, 15

    View Slide

  71. 71
    Riak issue for years
    Monday, March 30, 15

    View Slide

  72. 72
    Optimized counters
    in Riak 1.3
    Monday, March 30, 15

    View Slide

  73. 73
    “Scheduler” Partitioned
    counters in ETS
    Monday, March 30, 15

    View Slide

  74. 74
    Histograms still a
    challenge
    Monday, March 30, 15

    View Slide

  75. 75
    Reservoir sampling
    (ETS)
    Monday, March 30, 15

    View Slide

  76. 76
    Let’s write a NIF!
    Monday, March 30, 15

    View Slide

  77. 77
    min/max via atomics
    Monday, March 30, 15

    View Slide

  78. 78
    template
    void atomic_max(TA &atomic, T val) {
    T current = atomic.load(std::memory_order_relaxed);
    while(val > current) {
    current = val;
    val = atomic.exchange(val);
    }
    }
    Monday, March 30, 15

    View Slide

  79. 79
    Normal reservoir
    sampling with atomic
    increments
    Monday, March 30, 15

    View Slide

  80. 80
    200 metrics
    10 million events each
    8-16 workers
    30-40s runtime
    Monday, March 30, 15

    View Slide

  81. 81
    50 million events/s
    Monday, March 30, 15

    View Slide

  82. 82
    Future Work
    Partitioned
    Reservoir sampling
    Monday, March 30, 15

    View Slide

  83. 83
    http://gregable.com/2007/10/
    reservoir-sampling.html
    Monday, March 30, 15

    View Slide

  84. Conclusion
    84
    Monday, March 30, 15

    View Slide

  85. 85
    NIFs can help
    Monday, March 30, 15

    View Slide

  86. 86
    Optimized reusable
    components
    Monday, March 30, 15

    View Slide

  87. 87
    NIFs are hard
    Monday, March 30, 15

    View Slide

  88. 88
    Garbage collection/SMR
    Allocation
    Copying
    Monday, March 30, 15

    View Slide

  89. 89
    Erlang + NIFs are hard
    Monday, March 30, 15

    View Slide

  90. 90
    But, think it’s work it
    Monday, March 30, 15

    View Slide

  91. 91
    Much more work to do
    Monday, March 30, 15

    View Slide

  92. 92
    github.com/jtuple/ef2015
    Monday, March 30, 15

    View Slide

  93. Questions?
    93
    Monday, March 30, 15

    View Slide