Upgrade to Pro — share decks privately, control downloads, hide ads and more …

code::dive 2019 - What do you mean by "Cache Friendly"?

code::dive 2019 - What do you mean by "Cache Friendly"?

Data structures, and sometimes the algorithms that operate on them, can be described as "cache friendly" or "cache hostile", but what is meant by that, and does it really matter?

Cache memory in modern CPUs can be a hundred times faster than main memory, but caches are very small and have some interesting properties, that some times can be counter-intuitive. Getting good performance requires thinking about how your data structures are laid out in memory, and how they are accessed.

This presentation will explain why some constructions are problematic and show better alternatives. I will show tools for analyzing cache efficiency, and things to think about when making changes to gain performance. You will develop an intuition for writing fast software by default, and learn techniques to improve it.

Björn Fahller

November 21, 2019
Tweet

More Decks by Björn Fahller

Other Decks in Programming

Transcript

  1. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 1/205
    What Do You Mean by “Cache Friendly”?
    Björn Fahller

    View Slide

  2. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 2/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    };
    static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };
    timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer* iter = timeouts.prev;
    while (iter != &timeouts && is_after(iter->deadline, deadline))
    iter = iter->prev;
    add_behind(iter, deadline, cb, userp);
    }

    View Slide

  3. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 3/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    };
    static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };
    timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer* iter = timeouts.prev;
    while (iter != &timeouts && is_after(iter->deadline, deadline))
    iter = iter->prev;
    add_behind(iter, deadline, cb, userp);
    }

    View Slide

  4. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 4/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    };
    static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };
    timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer* iter = timeouts.prev;
    while (iter != &timeouts && is_after(iter->deadline, deadline))
    iter = iter->prev;
    add_behind(iter, deadline, cb, userp);
    }

    View Slide

  5. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 5/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    };
    static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts };
    timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer* iter = timeouts.prev;
    while (iter != &timeouts && is_after(iter->deadline, deadline))
    iter = iter->prev;
    add_behind(iter, deadline, cb, userp);
    }
    void cancel_timer(timer* t) {
    t->next->prev = t->prev; t->prev->next = t->next; free(t);
    }

    View Slide

  6. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 6/205
    What Do You Mean by “Cache Friendly”?
    Björn Fahller

    View Slide

  7. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 7/205
    Simplistic model of cache behaviour
    Includes

    View Slide

  8. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 8/205
    Simplistic model of cache behaviour
    Includes

    The cache is small

    View Slide

  9. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 9/205
    Simplistic model of cache behaviour
    Includes

    The cache is small

    and consists of fixed size lines

    View Slide

  10. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 10/205
    Simplistic model of cache behaviour
    Includes

    The cache is small

    and consists of fixed size lines

    and data access hit is very fast

    View Slide

  11. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 11/205
    Simplistic model of cache behaviour
    Includes

    The cache is small

    and consists of fixed size lines

    and data access hit is very fast

    and data acess miss is very slow

    View Slide

  12. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 12/205
    Simplistic model of cache behaviour
    Includes

    The cache is small

    and consists of fixed size lines

    and data access hit is very fast

    and data acess miss is very slow
    Excludes

    View Slide

  13. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 13/205
    Simplistic model of cache behaviour
    Includes

    The cache is small

    and consists of fixed size lines

    and data access hit is very fast

    and data acess miss is very slow
    Excludes

    Multiple levels of caches

    View Slide

  14. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 14/205
    Simplistic model of cache behaviour
    Includes

    The cache is small

    and consists of fixed size lines

    and data access hit is very fast

    and data acess miss is very slow
    Excludes

    Multiple levels of caches

    Associativity

    View Slide

  15. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 15/205
    Simplistic model of cache behaviour
    Includes

    The cache is small

    and consists of fixed size lines

    and data access hit is very fast

    and data acess miss is very slow
    Excludes

    Multiple levels of caches

    Associativity

    Threading

    View Slide

  16. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 16/205
    Simplistic model of cache behaviour
    Includes

    The cache is small

    and consists of fixed size lines

    and data access hit is very fast

    and data acess miss is very slow
    Excludes

    Multiple levels of caches

    Associativity

    Threading
    All models are wrong, but some are useful

    View Slide

  17. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 17/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x3A10
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    Simplistic model of cache behaviour

    View Slide

  18. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 18/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x3A10
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    Simplistic model of cache behaviour

    View Slide

  19. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 19/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x3A10
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    Simplistic model of cache behaviour

    View Slide

  20. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 20/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x3A10
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    Simplistic model of cache behaviour

    View Slide

  21. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 21/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x3A10
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    Simplistic model of cache behaviour

    View Slide

  22. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 22/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x3A10
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    Simplistic model of cache behaviour

    View Slide

  23. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 23/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    Simplistic model of cache behaviour

    View Slide

  24. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 24/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    Simplistic model of cache behaviour

    View Slide

  25. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 25/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    Simplistic model of cache behaviour

    View Slide

  26. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 26/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    Simplistic model of cache behaviour

    View Slide

  27. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 27/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    Simplistic model of cache behaviour

    View Slide

  28. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 28/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    Simplistic model of cache behaviour
    0x4010

    View Slide

  29. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 29/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4010
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    Simplistic model of cache behaviour
    0x4010

    View Slide

  30. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 30/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    Simplistic model of cache behaviour

    View Slide

  31. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 31/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    0x4080
    Simplistic model of cache behaviour

    View Slide

  32. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 32/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    0x4080
    Simplistic model of cache behaviour

    View Slide

  33. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 33/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    0x4080
    Simplistic model of cache behaviour

    View Slide

  34. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 34/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    0x4080
    Simplistic model of cache behaviour

    View Slide

  35. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 35/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    0x4080
    Simplistic model of cache behaviour

    View Slide

  36. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 36/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    0x4080
    0x4080
    Simplistic model of cache behaviour

    View Slide

  37. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 37/205
    const int* hot = 0x4001;
    const int* cold = 0x4042;
    int* also_cold = 0x4080;
    int a = *hot;
    int c = *cold;
    *also_cold = a;
    also_cold[1] = c;
    0x4000
    0x4FF0
    cache
    0x4000
    0x4010
    0x4020
    0x4030
    0x4040
    0x4050
    0x4060
    0x4070
    0x4080
    0x4090
    0x40A0
    0x40B0
    0x40C0
    0x40D0
    0x40E0
    0x40F0
    memory
    0x4040
    0x4080
    0x4080
    Simplistic model of cache behaviour

    View Slide

  38. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 38/205
    Analysis of implementation
    int main() {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution dist;
    for (int k = 0; k < 10; ++k) {
    timer* prev = nullptr;
    for (int i = 0; i < 20'000; ++i) {
    timer* t = schedule_timer(
    dist(gen),
    [](void*){return 0U;}, nullptr);
    if (i & 1) cancel_timer(prev);
    prev = t;
    }
    while (shoot_first())
    ;
    }
    }

    View Slide

  39. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 39/205
    Analysis of implementation
    int main() {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution dist;
    for (int k = 0; k < 10; ++k) {
    timer* prev = nullptr;
    for (int i = 0; i < 20'000; ++i) {
    timer* t = schedule_timer(
    dist(gen),
    [](void*){return 0U;}, nullptr);
    if (i & 1) cancel_timer(prev);
    prev = t;
    }
    while (shoot_first())
    ;
    }
    }

    View Slide

  40. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 40/205
    Analysis of implementation
    int main() {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution dist;
    for (int k = 0; k < 10; ++k) {
    timer* prev = nullptr;
    for (int i = 0; i < 20'000; ++i) {
    timer* t = schedule_timer(
    dist(gen),
    [](void*){return 0U;}, nullptr);
    if (i & 1) cancel_timer(prev);
    prev = t;
    }
    while (shoot_first())
    ;
    }
    }

    View Slide

  41. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 41/205
    Analysis of implementation
    int main() {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution dist;
    for (int k = 0; k < 10; ++k) {
    timer* prev = nullptr;
    for (int i = 0; i < 20'000; ++i) {
    timer* t = schedule_timer(
    dist(gen),
    [](void*){return 0U;}, nullptr);
    if (i & 1) cancel_timer(prev);
    prev = t;
    }
    while (shoot_first())
    ;
    }
    }

    View Slide

  42. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 42/205
    Analysis of implementation
    int main() {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution dist;
    for (int k = 0; k < 10; ++k) {
    timer* prev = nullptr;
    for (int i = 0; i < 20'000; ++i) {
    timer* t = schedule_timer(
    dist(gen),
    [](void*){return 0U;}, nullptr);
    if (i & 1) cancel_timer(prev);
    prev = t;
    }
    while (shoot_first())
    ;
    }
    }
    bool shoot_first() {
    if (timeouts.next == &timeouts)
    return false;
    timer* t = timeouts.next;
    t->callback(t->userp);
    cancel_timer(t);
    return true;
    }

    View Slide

  43. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 43/205
    Analysis of implementation
    valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes

    View Slide

  44. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 44/205
    Analysis of implementation
    valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
    Essentially a profiler that
    collects info about call
    hierarchies, number of
    calls, and time spent.
    The CPU simulator is not
    cycle accurate, so see
    timing results as a broad
    picture.

    View Slide

  45. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 45/205
    Analysis of implementation
    valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
    Essentially a profiler that
    collects info about call
    hierarchies, number of
    calls, and time spent.
    The CPU simulator is not
    cycle accurate, so see
    timing results as a broad
    picture.
    Simulates a CPU cache,
    flattened to 2 levels, L1 and LL.
    It shows you where you get
    cache misses.
    L1 is by default a model of
    your host CPU L1, but you
    can change size, line-size,
    and associativity.

    View Slide

  46. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 46/205
    Analysis of implementation
    valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
    Essentially a profiler that
    collects info about call
    hierarchies, number of
    calls, and time spent.
    The CPU simulator is not
    cycle accurate, so see
    timing results as a broad
    picture.
    Simulates a CPU cache,
    flattened to 2 levels, L1 and LL.
    It shows you where you get
    cache misses.
    L1 is by default a model of
    your host CPU L1, but you
    can change size, line-size,
    and associativity.
    Collects statistics per
    instruction instead of per
    source line. Can help
    pinpointing bottlenecks.

    View Slide

  47. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 47/205
    Analysis of implementation
    valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
    Essentially a profiler that
    collects info about call
    hierarchies, number of
    calls, and time spent.
    The CPU simulator is not
    cycle accurate, so see
    timing results as a broad
    picture.
    Simulates a CPU cache,
    flattened to 2 levels, L1 and LL.
    It shows you where you get
    cache misses.
    L1 is by default a model of
    your host CPU L1, but you
    can change size, line-size,
    and associativity.
    Collects statistics per
    instruction instead of per
    source line. Can help
    pinpointing bottlenecks.
    Simulates a branch
    predictor.

    View Slide

  48. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 48/205
    Analysis of implementation
    valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
    Essentially a profiler that
    collects info about call
    hierarchies, number of
    calls, and time spent.
    The CPU simulator is not
    cycle accurate, so see
    timing results as a broad
    picture.
    Simulates a CPU cache,
    flattened to 2 levels, L1 and LL.
    It shows you where you get
    cache misses.
    L1 is by default a model of
    your host CPU L1, but you
    can change size, line-size,
    and associativity.
    Collects statistics per
    instruction instead of per
    source line. Can help
    pinpointing bottlenecks.
    Simulates a branch
    predictor.
    Very slow!

    View Slide

  49. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 49/205
    Live demo

    View Slide

  50. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 50/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;

    View Slide

  51. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 51/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;
    // 4 bytes

    View Slide

  52. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 52/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;
    // 4 bytes
    // 4 bytes padding for alignment

    View Slide

  53. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 53/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;
    // 4 bytes
    // 4 bytes padding for alignment
    // 8 bytes

    View Slide

  54. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 54/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;
    // 4 bytes
    // 4 bytes padding for alignment
    // 8 bytes
    // 8 bytes

    View Slide

  55. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 55/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;
    // 4 bytes
    // 4 bytes padding for alignment
    // 8 bytes
    // 8 bytes
    // 8 bytes

    View Slide

  56. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 56/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;
    // 4 bytes
    // 4 bytes padding for alignment
    // 8 bytes
    // 8 bytes
    // 8 bytes
    // 8 bytes

    View Slide

  57. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 57/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;
    // 4 bytes
    // 4 bytes padding for alignment
    // 8 bytes
    // 8 bytes
    // 8 bytes
    // 8 bytes
    // sum = 40 bytes

    View Slide

  58. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 58/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;
    // 4 bytes
    // 4 bytes padding for alignment
    // 8 bytes
    // 8 bytes
    // 8 bytes
    // 8 bytes
    // sum = 40 bytes
    66% of all L1d cache misses

    View Slide

  59. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 59/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;
    // 4 bytes
    // 4 bytes padding for alignment
    // 8 bytes
    // 8 bytes
    // 8 bytes
    // 8 bytes
    // sum = 40 bytes
    66% of all L1d cache misses
    Rule of thumb:
    Follow pointer => cache miss

    View Slide

  60. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 60/205
    typedef uint32_t (*timer_cb)(void*);
    typedef struct timer {
    uint32_t deadline;
    timer_cb callback;
    void* userp;
    struct timer* next;
    struct timer* prev;
    } timer;
    // 4 bytes
    // 4 bytes padding for alignment
    // 8 bytes
    // 8 bytes
    // 8 bytes
    // 8 bytes
    // sum = 40 bytes
    66% of all L1d cache misses
    Rule of thumb:
    Follow pointer => cache miss
    33% of all L1d cache misses

    View Slide

  61. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 61/205
    Chasing pointers is expensive.
    Let’s get rid of the pointers.

    View Slide

  62. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 62/205
    typedef uint32_t (*timer_cb)(void*);
    typedef uint32_t timer;
    struct timer_data {
    uint32_t deadline;
    timer id;
    void* userp;
    timer_cb callback;
    };
    std::vector timeouts;
    uint32_t next_id = 0;

    View Slide

  63. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 63/205
    typedef uint32_t (*timer_cb)(void*);
    typedef uint32_t timer;
    struct timer_data {
    uint32_t deadline;
    timer id;
    void* userp;
    timer_cb callback;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    24 bytes per entry.
    No pointer chasing

    View Slide

  64. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 64/205
    typedef uint32_t (*timer_cb)(void*);
    typedef uint32_t timer;
    struct timer_data {
    uint32_t deadline;
    timer id;
    void* userp;
    timer_cb callback;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    24 bytes per entry.
    No pointer chasing
    Linear structure

    View Slide

  65. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 65/205
    typedef uint32_t (*timer_cb)(void*);
    typedef uint32_t timer;
    struct timer_data {
    uint32_t deadline;
    timer id;
    void* userp;
    timer_cb callback;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    auto idx = timeouts.size();
    timeouts.push_back({});
    while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) {
    timeouts[idx] = std::move(timeouts[idx-1]);
    --idx;
    }
    timeouts[idx] = timer_data{deadline, next_id++, userp, cb };
    return next_id;
    }

    View Slide

  66. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 66/205
    typedef uint32_t (*timer_cb)(void*);
    typedef uint32_t timer;
    struct timer_data {
    uint32_t deadline;
    timer id;
    void* userp;
    timer_cb callback;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    auto idx = timeouts.size();
    timeouts.push_back({});
    while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) {
    timeouts[idx] = std::move(timeouts[idx-1]);
    --idx;
    }
    timeouts[idx] = timer_data{deadline, next_id++, userp, cb };
    return next_id;
    }
    Linear insertion sort

    View Slide

  67. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 67/205
    typedef uint32_t (*timer_cb)(void*);
    typedef uint32_t timer;
    struct timer_data {
    uint32_t deadline;
    timer id;
    void* userp;
    timer_cb callback;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    auto idx = timeouts.size();
    timeouts.push_back({});
    while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) {
    timeouts[idx] = std::move(timeouts[idx-1]);
    --idx;
    }
    timeouts[idx] = timer_data{deadline, next_id++, userp, cb };
    return next_id;
    }

    View Slide

  68. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 68/205
    typedef uint32_t (*timer_cb)(void*);
    typedef uint32_t timer;
    struct timer_data {
    uint32_t deadline;
    timer id;
    void* userp;
    timer_cb callback;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    auto idx = timeouts.size();
    timeouts.push_back({});
    while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) {
    timeouts[idx] = std::move(timeouts[idx-1]);
    --idx;
    }
    timeouts[idx] = timer_data{deadline, next_id++, userp, cb };
    return next_id;
    }
    void cancel_timer(timer t)
    {
    auto i = std::find_if(timeouts.begin(), timeouts.end(),
    [t](const auto& e) { return e.id == t; });
    timeouts.erase(i);
    }

    View Slide

  69. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 69/205
    typedef uint32_t (*timer_cb)(void*);
    typedef uint32_t timer;
    struct timer_data {
    uint32_t deadline;
    timer id;
    void* userp;
    timer_cb callback;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    auto idx = timeouts.size();
    timeouts.push_back({});
    while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) {
    timeouts[idx] = std::move(timeouts[idx-1]);
    --idx;
    }
    timeouts[idx] = timer_data{deadline, next_id++, userp, cb };
    return next_id;
    }
    void cancel_timer(timer t)
    {
    auto i = std::find_if(timeouts.begin(), timeouts.end(),
    [t](const auto& e) { return e.id == t; });
    timeouts.erase(i);
    }
    Linear search

    View Slide

  70. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 70/205
    Analysis of implementation
    perf stat -e cycles,instructions,l1d-loads,l1d-load-misses

    View Slide

  71. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 71/205
    Analysis of implementation
    perf stat -e cycles,instructions,l1d-loads,l1d-load-misses
    Presents statistics from
    whole run of program,
    using counters from HW
    and linux kernel.

    View Slide

  72. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 72/205
    Analysis of implementation
    perf stat -e cycles,instructions,l1d-loads,l1d-load-misses
    Presents statistics from
    whole run of program,
    using counters from HW
    and linux kernel.
    Number of cycles per
    instruction is a proxy for
    how much the CPU is
    working or waiting.

    View Slide

  73. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 73/205
    Analysis of implementation
    perf stat -e cycles,instructions,l1d-loads,l1d-load-misses
    Presents statistics from
    whole run of program,
    using counters from HW
    and linux kernel.
    Number of cycles per
    instruction is a proxy for
    how much the CPU is
    working or waiting.
    Number of reads from
    L1d cache, and number
    of misses. Speculative
    execution can make these
    numbers confusing.

    View Slide

  74. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 74/205
    Analysis of implementation
    perf stat -e cycles,instructions,l1d-loads,l1d-load-misses
    Presents statistics from
    whole run of program,
    using counters from HW
    and linux kernel.
    Number of cycles per
    instruction is a proxy for
    how much the CPU is
    working or waiting.
    Number of reads from
    L1d cache, and number
    of misses. Speculative
    execution can make these
    numbers confusing.
    Very fast!

    View Slide

  75. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 75/205
    Analysis of implementation
    perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr

    View Slide

  76. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 76/205
    Analysis of implementation
    perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr
    Records where in your
    program the counters are
    gathered.

    View Slide

  77. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 77/205
    Analysis of implementation
    perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr
    Records where in your
    program the counters are
    gathered.
    Records call graph info,
    instead of just location.
    LBR requires no special
    compilation flags.

    View Slide

  78. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 78/205
    Analysis of implementation
    perf record -e cycles,instructions,l1d-loads,l1d-load-misses --call-graph=lbr
    Records where in your
    program the counters are
    gathered.
    Records call graph info,
    instead of just location.
    LBR requires no special
    compilation flags.
    Very fast!

    View Slide

  79. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 79/205
    Live demo

    View Slide

  80. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 80/205

    View Slide

  81. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 81/205
    Linear search is expensive.
    Maybe try binary search?

    View Slide

  82. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 82/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    uint32_t deadline;
    uint32_t id;
    void* userp;
    timer_cb callback;
    };
    struct timer {
    uint32_t deadline;
    uint32_t id;
    };
    std::vector timeouts;
    uint32_t next_id = 0;

    View Slide

  83. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 83/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    uint32_t deadline;
    uint32_t id;
    void* userp;
    timer_cb callback;
    };
    struct timer {
    uint32_t deadline;
    uint32_t id;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer_data element{deadline, next_id, userp, cb};
    auto i = std::lower_bound(timeouts.begin(), timeouts.end(),
    element, is_after);
    timeouts.insert(i, element);
    return {deadline, next_id++};
    }

    View Slide

  84. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 84/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    uint32_t deadline;
    uint32_t id;
    void* userp;
    timer_cb callback;
    };
    struct timer {
    uint32_t deadline;
    uint32_t id;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer_data element{deadline, next_id, userp, cb};
    auto i = std::lower_bound(timeouts.begin(), timeouts.end(),
    element, is_after);
    timeouts.insert(i, element);
    return {deadline, next_id++};
    }
    Binary search for
    insertion point

    View Slide

  85. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 85/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    uint32_t deadline;
    uint32_t id;
    void* userp;
    timer_cb callback;
    };
    struct timer {
    uint32_t deadline;
    uint32_t id;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer_data element{deadline, next_id, userp, cb};
    auto i = std::lower_bound(timeouts.begin(), timeouts.end(),
    element, is_after);
    timeouts.insert(i, element);
    return {deadline, next_id++};
    }
    Linear insertion

    View Slide

  86. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 86/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    uint32_t deadline;
    uint32_t id;
    void* userp;
    timer_cb callback;
    };
    struct timer {
    uint32_t deadline;
    uint32_t id;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer_data element{deadline, next_id, userp, cb};
    auto i = std::lower_bound(timeouts.begin(), timeouts.end(),
    element, is_after);
    timeouts.insert(i, element);
    return {deadline, next_id++};
    }
    Linear insertion
    void cancel_timer(timer t) {
    timer_data element{t.deadline, t.id, nullptr, nullptr};
    auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(),
    element, is_after);
    auto i = std::find_if(lo, hi,
    [t](const auto& e) { return e.id == t.id; });
    if (i != hi) {
    timeouts.erase(i);
    }
    }

    View Slide

  87. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 87/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    uint32_t deadline;
    uint32_t id;
    void* userp;
    timer_cb callback;
    };
    struct timer {
    uint32_t deadline;
    uint32_t id;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer_data element{deadline, next_id, userp, cb};
    auto i = std::lower_bound(timeouts.begin(), timeouts.end(),
    element, is_after);
    timeouts.insert(i, element);
    return {deadline, next_id++};
    }
    Linear insertion
    void cancel_timer(timer t) {
    timer_data element{t.deadline, t.id, nullptr, nullptr};
    auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(),
    element, is_after);
    auto i = std::find_if(lo, hi,
    [t](const auto& e) { return e.id == t.id; });
    if (i != hi) {
    timeouts.erase(i);
    }
    }
    Binary search for
    timers with the
    same deadline

    View Slide

  88. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 88/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    uint32_t deadline;
    uint32_t id;
    void* userp;
    timer_cb callback;
    };
    struct timer {
    uint32_t deadline;
    uint32_t id;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer_data element{deadline, next_id, userp, cb};
    auto i = std::lower_bound(timeouts.begin(), timeouts.end(),
    element, is_after);
    timeouts.insert(i, element);
    return {deadline, next_id++};
    }
    Linear insertion
    void cancel_timer(timer t) {
    timer_data element{t.deadline, t.id, nullptr, nullptr};
    auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(),
    element, is_after);
    auto i = std::find_if(lo, hi,
    [t](const auto& e) { return e.id == t.id; });
    if (i != hi) {
    timeouts.erase(i);
    }
    }
    Linear search for
    matching id

    View Slide

  89. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 89/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    uint32_t deadline;
    uint32_t id;
    void* userp;
    timer_cb callback;
    };
    struct timer {
    uint32_t deadline;
    uint32_t id;
    };
    std::vector timeouts;
    uint32_t next_id = 0;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp)
    {
    timer_data element{deadline, next_id, userp, cb};
    auto i = std::lower_bound(timeouts.begin(), timeouts.end(),
    element, is_after);
    timeouts.insert(i, element);
    return {deadline, next_id++};
    }
    Linear insertion
    void cancel_timer(timer t) {
    timer_data element{t.deadline, t.id, nullptr, nullptr};
    auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(),
    element, is_after);
    auto i = std::find_if(lo, hi,
    [t](const auto& e) { return e.id == t.id; });
    if (i != hi) {
    timeouts.erase(i);
    }
    }
    Linear removal

    View Slide

  90. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 90/205
    Live demo

    View Slide

  91. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 91/205
    Searches not visible in profiling.
    Number of reads reduced.
    Number of cache misses high.
    memmove() dominates.

    View Slide

  92. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 92/205
    Searches not visible in profiling.
    Number of reads reduced.
    Number of cache misses high.
    memmove() dominates.
    Failed branch predictions
    can lead to cache entry eviction!

    View Slide

  93. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 93/205
    Searches not visible in profiling.
    Number of reads reduced.
    Number of cache misses high.
    memmove() dominates.
    Failed branch predictions
    can lead to cache entry eviction!
    Maybe try a map<>?

    View Slide

  94. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 94/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    void* userp;
    timer_cb callback;
    };
    struct is_after {
    bool operator()(uint32_t lh, uint32_t rh) const {
    return lh < rh;
    }
    };
    using timer_map = std::multimap;
    using timer = timer_map::iterator;
    static timer_map timeouts;

    View Slide

  95. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 95/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    void* userp;
    timer_cb callback;
    };
    struct is_after {
    bool operator()(uint32_t lh, uint32_t rh) const {
    return lh < rh;
    }
    };
    using timer_map = std::multimap;
    using timer = timer_map::iterator;
    static timer_map timeouts;

    View Slide

  96. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 96/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    void* userp;
    timer_cb callback;
    };
    struct is_after {
    bool operator()(uint32_t lh, uint32_t rh) const {
    return lh < rh;
    }
    };
    using timer_map = std::multimap;
    using timer = timer_map::iterator;
    static timer_map timeouts;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) {
    return timeouts.insert(std::make_pair(deadline,
    timer_data{userp, cb}));
    }
    void cancel_timer(timer t) {
    timeouts.erase(t);
    }

    View Slide

  97. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 97/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    void* userp;
    timer_cb callback;
    };
    struct is_after {
    bool operator()(uint32_t lh, uint32_t rh) const {
    return lh < rh;
    }
    };
    using timer_map = std::multimap;
    using timer = timer_map::iterator;
    static timer_map timeouts;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) {
    return timeouts.insert(std::make_pair(deadline,
    timer_data{userp, cb}));
    }
    void cancel_timer(timer t) {
    timeouts.erase(t);
    }

    View Slide

  98. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 98/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    void* userp;
    timer_cb callback;
    };
    struct is_after {
    bool operator()(uint32_t lh, uint32_t rh) const {
    return lh < rh;
    }
    };
    using timer_map = std::multimap;
    using timer = timer_map::iterator;
    static timer_map timeouts;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) {
    return timeouts.insert(std::make_pair(deadline,
    timer_data{userp, cb}));
    }
    void cancel_timer(timer t) {
    timeouts.erase(t);
    }

    View Slide

  99. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 99/205
    typedef uint32_t (*timer_cb)(void*);
    struct timer_data {
    void* userp;
    timer_cb callback;
    };
    struct is_after {
    bool operator()(uint32_t lh, uint32_t rh) const {
    return lh < rh;
    }
    };
    using timer_map = std::multimap;
    using timer = timer_map::iterator;
    static timer_map timeouts;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) {
    return timeouts.insert(std::make_pair(deadline,
    timer_data{userp, cb}));
    }
    void cancel_timer(timer t) {
    timeouts.erase(t);
    }
    bool shoot_first() {
    if (timeouts.empty()) return false;
    auto i = timeouts.begin();
    i->second.callback(i->second.userp);
    timeouts.erase(i);
    return true;
    }

    View Slide

  100. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 100/205
    Live demo

    View Slide

  101. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 101/205
    Faster, but lots of
    cache misses when
    comparing keys
    and rebalancing
    the tree.

    View Slide

  102. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 102/205
    Faster, but lots of
    cache misses when
    comparing keys
    and rebalancing
    the tree.
    What did I say about
    chasing pointers?

    View Slide

  103. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 103/205
    Faster, but lots of
    cache misses when
    comparing keys
    and rebalancing
    the tree.
    What did I say about
    chasing pointers?
    1 10 100 1000 10000
    1.00E-08
    1.00E-07
    1.00E-06
    1.00E-05
    1.00E-04
    1.00E-03
    1.00E-02
    Execution time
    linear
    bsearch
    map
    Number of elements
    seconds

    View Slide

  104. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 104/205
    Faster, but lots of
    cache misses when
    comparing keys
    and rebalancing
    the tree.
    What did I say about
    chasing pointers?
    1 10 100 1000 10000
    1.00E-08
    1.00E-07
    1.00E-06
    1.00E-05
    1.00E-04
    1.00E-03
    1.00E-02
    Execution time
    linear
    bsearch
    map
    Number of elements
    seconds
    1 10 100 1000 10000
    0
    0.2
    0.4
    0.6
    0.8
    1
    1.2
    1.4
    1.6
    1.8
    2
    Performance relative to linear
    Execution time
    bsearch/linear
    map/linear
    Number of elements
    Time relative linear

    View Slide

  105. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 105/205
    Faster, but lots of
    cache misses when
    comparing keys
    and rebalancing
    the tree.
    What did I say about
    chasing pointers?

    View Slide

  106. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 106/205
    Faster, but lots of
    cache misses when
    comparing keys
    and rebalancing
    the tree.
    What did I say about
    chasing pointers?
    Can we get log(n)
    lookup without
    chasing pointers?

    View Slide

  107. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 107/205
    Enter the HEAP

    View Slide

  108. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 108/205
    3
    5 8
    6 10
    10 14
    9 15
    13
    12 11
    Enter the HEAP

    View Slide

  109. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 109/205
    3
    5 8
    6 10
    10 14
    9 15
    13
    12 11
    Enter the HEAP

    Perfectly balanced partially sorted tree

    View Slide

  110. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 110/205
    3
    5 8
    6 10
    10 14
    9 15
    13
    12 11
    Enter the HEAP

    Perfectly balanced partially sorted tree

    Every node is sorted after or same as its parent

    View Slide

  111. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 111/205
    3
    5 8
    6 10
    10 14
    9 15
    13
    12 11
    Enter the HEAP

    Perfectly balanced partially sorted tree

    Every node is sorted after or same as its parent

    No relation between siblings

    View Slide

  112. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 112/205
    3
    5 8
    6 10
    10 14
    9 15
    13
    12 11
    Enter the HEAP

    Perfectly balanced partially sorted tree

    Every node is sorted after or same as its parent

    No relation between siblings

    At most one node with only one child,
    and that child is the last node

    View Slide

  113. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 113/205
    10 14
    15
    13
    12 11
    8
    3
    5
    6
    9

    View Slide

  114. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 114/205
    10 14
    15
    13
    12 11
    8
    Insertion:
    3
    5
    6
    9

    View Slide

  115. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 115/205
    10 14
    15
    13
    12 11
    8
    Insertion:

    Create space
    3
    5
    6
    9

    View Slide

  116. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 116/205
    10 14
    15
    13
    12 11
    8
    Insertion:

    Create space

    Trickle down greater nodes
    3
    5
    6
    9

    View Slide

  117. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 117/205
    10 14
    15
    13
    12 11
    8
    Insertion:

    Create space

    Trickle down greater nodes

    Insert into space
    3
    5
    6
    9

    View Slide

  118. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 118/205
    10 14
    15
    13
    12 11
    8
    7
    Insertion:

    Create space

    Trickle down greater nodes

    Insert into space
    3
    5
    6
    9

    View Slide

  119. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 119/205
    10
    10 14
    15
    13
    12 11
    8
    7
    Insertion:

    Create space

    Trickle down greater nodes

    Insert into space
    3
    5
    6
    9

    View Slide

  120. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 120/205
    10
    10 14
    15
    13
    12 11
    8
    7
    Insertion:

    Create space

    Trickle down greater nodes

    Insert into space
    3
    5
    6
    9

    View Slide

  121. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 121/205
    10
    10 14
    15
    13
    12 11
    8
    7
    Insertion:

    Create space

    Trickle down greater nodes

    Insert into space
    3
    5
    6
    9

    View Slide

  122. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 122/205
    7
    8
    10 14
    15
    13
    12 11
    3
    5
    6
    9 10

    View Slide

  123. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 123/205
    7
    8
    10 14
    15
    13
    12 11
    Pop top:
    3
    5
    6
    9 10

    View Slide

  124. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 124/205
    7
    8
    10 14
    15
    13
    12 11
    Pop top:

    Remove top
    3
    5
    6
    9 10

    View Slide

  125. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 125/205
    7
    8
    10 14
    15
    13
    12 11
    Pop top:

    Remove top

    Trickle up lesser child
    3
    5
    6
    9 10

    View Slide

  126. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 126/205
    7
    8
    10 14
    15
    13
    12 11
    Pop top:

    Remove top

    Trickle up lesser child

    move-insert last into hole
    3
    5
    6
    9 10

    View Slide

  127. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 127/205
    7
    8
    10 14
    15
    13
    12 11
    Pop top:

    Remove top

    Trickle up lesser child

    move-insert last into hole
    5
    6
    9 10

    View Slide

  128. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 128/205
    7
    8
    10 14
    15
    13
    12 11
    Pop top:

    Remove top

    Trickle up lesser child

    move-insert last into hole
    5
    6
    9 10

    View Slide

  129. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 129/205
    7
    8
    10 14
    15
    13
    12 11
    Pop top:

    Remove top

    Trickle up lesser child

    move-insert last into hole
    5
    6
    9 10

    View Slide

  130. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 130/205
    7
    8
    10 14
    15
    13
    12 11
    Pop top:

    Remove top

    Trickle up lesser child

    move-insert last into hole
    5
    6
    9 10

    View Slide

  131. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 131/205
    7
    8
    10 14
    15
    13
    12 11
    Pop top:

    Remove top

    Trickle up lesser child

    move-insert last into hole
    5
    6
    9 10

    View Slide

  132. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 132/205
    7
    8
    10 14
    15
    13
    12 11
    Pop top:

    Remove top

    Trickle up lesser child

    move-insert last into hole
    5
    6
    9 10

    View Slide

  133. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 133/205
    5
    1
    6
    2
    7
    3
    9
    4
    10
    5
    8
    6
    14
    7
    10
    8
    12
    9
    13
    10
    11
    11
    15
    12
    15
    15
    15
    15

    View Slide

  134. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 134/205
    5
    1
    6
    2
    7
    3
    9
    4
    10
    5
    8
    6
    14
    7
    10
    8
    12
    9
    13
    10
    11
    11
    15
    12
    15
    15
    15
    15
    Addressing:
    The index of a parent node
    is half (rounded down) of that
    of a child.

    View Slide

  135. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 135/205
    5
    1
    6
    2
    7
    3
    9
    4
    10
    5
    8
    6
    14
    7
    10
    8
    12
    9
    13
    10
    11
    11
    15
    12
    15
    15
    15
    15
    Addressing:
    The index of a parent node
    is half (rounded down) of that
    of a child.

    View Slide

  136. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 136/205
    5
    1
    6
    2
    7
    3
    9
    4
    10
    5
    8
    6
    14
    7
    10
    8
    12
    9
    13
    10
    11
    11
    15
    12
    15
    15
    15
    15
    Addressing:
    The index of a parent node
    is half (rounded down) of that
    of a child.
    Array indexes!
    No pointer chasing!

    View Slide

  137. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 137/205
    The heap is not searchable,
    so how handle cancellation?

    View Slide

  138. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 138/205
    The heap is not searchable,
    so how handle cancellation?
    struct timer_action {
    uint32_t (*callback)(void*);
    void* userp;
    };

    View Slide

  139. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 139/205
    The heap is not searchable,
    so how handle cancellation?
    actions
    struct timer_action {
    uint32_t (*callback)(void*);
    void* userp;
    };

    View Slide

  140. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 140/205
    The heap is not searchable,
    so how handle cancellation?
    actions
    struct timer_action {
    uint32_t (*callback)(void*);
    void* userp;
    };

    View Slide

  141. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 141/205
    The heap is not searchable,
    so how handle cancellation?
    actions
    struct timer_action {
    uint32_t (*callback)(void*);
    void* userp;
    };
    struct timeout {
    uint32_t deadline;
    uint32_t action_index;
    };

    View Slide

  142. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 142/205
    The heap is not searchable,
    so how handle cancellation?
    actions
    struct timer_action {
    uint32_t (*callback)(void*);
    void* userp;
    };
    struct timeout {
    uint32_t deadline;
    uint32_t action_index;
    };

    View Slide

  143. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 143/205
    The heap is not searchable,
    so how handle cancellation?
    actions
    struct timer_action {
    uint32_t (*callback)(void*);
    void* userp;
    };
    struct timeout {
    uint32_t deadline;
    uint32_t action_index;
    };
    Only 8 bytes
    per element of
    working data
    in the heap.

    View Slide

  144. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 144/205
    The heap is not searchable,
    so how handle cancellation?
    actions
    struct timer_action {
    uint32_t (*callback)(void*);
    void* userp;
    };
    struct timeout {
    uint32_t deadline;
    uint32_t action_index;
    };
    Cancel by
    setting callback
    to nullptr
    Only 8 bytes
    per element of
    working data
    in the heap.

    View Slide

  145. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 145/205
    struct timer_data {
    uint32_t deadline;
    uint32_t action_index;
    };
    struct is_after {
    bool operator()(const timer_data& lh, const timer_data& rh) const {
    return lh.deadline < rh.deadline;
    }
    };
    std::priority_queue, is_after>
    timeouts;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) {
    auto action_index = actions.push(cb, userp);
    timeouts.push(timer_data{deadline, action_index});
    return action_index;
    }

    View Slide

  146. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 146/205
    struct timer_data {
    uint32_t deadline;
    uint32_t action_index;
    };
    struct is_after {
    bool operator()(const timer_data& lh, const timer_data& rh) const {
    return lh.deadline < rh.deadline;
    }
    };
    std::priority_queue, is_after>
    timeouts;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) {
    auto action_index = actions.push(cb, userp);
    timeouts.push(timer_data{deadline, action_index});
    return action_index;
    }
    Container adapter that
    implements a heap

    View Slide

  147. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 147/205
    struct timer_data {
    uint32_t deadline;
    uint32_t action_index;
    };
    struct is_after {
    bool operator()(const timer_data& lh, const timer_data& rh) const {
    return lh.deadline < rh.deadline;
    }
    };
    std::priority_queue, is_after>
    timeouts;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) {
    auto action_index = actions.push(cb, userp);
    timeouts.push(timer_data{deadline, action_index});
    return action_index;
    }
    Container adapter that
    implements a heap

    View Slide

  148. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 148/205
    struct timer_data {
    uint32_t deadline;
    uint32_t action_index;
    };
    struct is_after {
    bool operator()(const timer_data& lh, const timer_data& rh) const {
    return lh.deadline < rh.deadline;
    }
    };
    std::priority_queue, is_after>
    timeouts;
    timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) {
    auto action_index = actions.push(cb, userp);
    timeouts.push(timer_data{deadline, action_index});
    return action_index;
    }
    Container adapter that
    implements a heap

    View Slide

  149. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 149/205
    bool shoot_first() {
    while (!timeouts.empty()) {
    auto& t = timeouts.top();
    auto& action = actions[t.action_index];
    if (action.callback) break;
    actions.remove(t.action_index);
    timeouts.pop();
    }
    if (timeouts.empty()) return false;
    auto& t = timeouts.top();
    auto& action = actions[t.action_index];
    action.callback(action.userp);
    actions.remove(t.action_index);
    timeouts.pop();
    return true;
    }
    Pop-off any cancelled items

    View Slide

  150. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 150/205
    Live demo

    View Slide

  151. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 151/205
    A lot fewer everything!
    and nearly twice as fast too

    View Slide

  152. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 152/205
    A lot fewer everything!
    and nearly twice as fast too
    1 10 100 1000 10000 100000
    0
    0.01
    0.01
    0.02
    0.02
    0.03
    Execution time
    linear
    bsearch
    map
    heap
    Number of elements
    Seconds

    View Slide

  153. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 153/205
    A lot fewer everything!
    and nearly twice as fast too
    1 10 100 1000 10000 100000
    0
    0.01
    0.01
    0.02
    0.02
    0.03
    Execution time
    linear
    bsearch
    map
    heap
    Number of elements
    Seconds
    1 10 100 1000 10000
    0
    0.2
    0.4
    0.6
    0.8
    1
    1.2
    1.4
    1.6
    Relative execution time
    heap/linear
    heap/map
    Number of elements
    Relavite time

    View Slide

  154. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 154/205
    A lot fewer everything!
    and nearly twice as fast too

    View Slide

  155. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 155/205
    A lot fewer everything!
    and nearly twice as fast too
    But there are many cache misses
    in the adjust-heap functions

    View Slide

  156. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 156/205
    A lot fewer everything!
    and nearly twice as fast too
    But there are many cache misses
    in the adjust-heap functions
    Can we do better?

    View Slide

  157. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 157/205
    How do the entries fit in
    cache lines?

    View Slide

  158. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 158/205
    How do the entries fit in
    cache lines?

    View Slide

  159. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 159/205
    How do the entries fit in
    cache lines?

    View Slide

  160. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 160/205
    How do the entries fit in
    cache lines?

    View Slide

  161. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 161/205
    How do the entries fit in
    cache lines?

    View Slide

  162. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 162/205
    How do the entries fit in
    cache lines?

    View Slide

  163. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 163/205
    How do the entries fit in
    cache lines?

    View Slide

  164. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 164/205
    Every generation is
    on a new cache line

    View Slide

  165. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 165/205
    Can we do better?

    View Slide

  166. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 166/205
    Can we do better?

    View Slide

  167. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 167/205
    Can we do better?

    View Slide

  168. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 168/205
    Can we do better?

    View Slide

  169. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 169/205
    Can we do better?

    View Slide

  170. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 170/205
    Can we do better?

    View Slide

  171. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 171/205
    Can we do better?

    View Slide

  172. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 172/205
    Three generations
    per cache line!

    View Slide

  173. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 173/205
    5
    1
    6
    2
    7
    3
    9
    4
    10
    5
    8
    6
    14
    7

    View Slide

  174. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 174/205
    5
    1
    6
    2
    7
    3
    9
    4
    10
    5
    8
    6
    14
    7

    View Slide

  175. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 175/205
    5
    1
    6
    2
    7
    3
    9
    4
    10
    5
    8
    6
    14
    7
    0

    View Slide

  176. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 176/205
    5
    1
    6
    2
    7
    3
    9
    4
    10
    5
    8
    6
    14
    7
    0
    8 9 10 11 12 13 14 15

    View Slide

  177. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 177/205
    5
    1
    6
    2
    7
    3
    9
    4
    10
    5
    8
    6
    14
    7
    0
    8 9 10 11 12 13 14 15
    16 17 18 19 20 21 22 23

    View Slide

  178. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 178/205
    5
    1
    6
    2
    7
    3
    9
    4
    10
    5
    8
    6
    14
    7
    0
    8 9 10 11 12 13 14 15
    16 17 18 19 20 21 22 23

    View Slide

  179. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 179/205
    class timeout_store {
    static constexpr size_t block_size = 8;
    static constexpr size_t block_mask = block_size – 1U;
    static size_t block_offset(size_t idx) {
    return idx & block_mask;
    }
    static size_t block_base(size_t idx) {
    return idx & ~block_mask;
    }
    static bool is_block_root(size_t idx) {
    return block_offset(idx) == 1;
    }
    static bool is_block_leaf(size_t idx) {
    return (idx & (block_size >> 1)) != 0U;
    }
    ...
    };
    1
    2 3
    4 5 6 7
    0

    View Slide

  180. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 180/205
    class timeout_store {
    static constexpr size_t block_size = 8;
    static constexpr size_t block_mask = block_size – 1U;
    static size_t block_offset(size_t idx) {
    return idx & block_mask;
    }
    static size_t block_base(size_t idx) {
    return idx & ~block_mask;
    }
    static bool is_block_root(size_t idx) {
    return block_offset(idx) == 1;
    }
    static bool is_block_leaf(size_t idx) {
    return (idx & (block_size >> 1)) != 0U;
    }
    ...
    };
    1
    2 3
    4 5 6 7
    0

    View Slide

  181. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 181/205
    class timeout_store {
    static constexpr size_t block_size = 8;
    static constexpr size_t block_mask = block_size – 1U;
    static size_t block_offset(size_t idx) {
    return idx & block_mask;
    }
    static size_t block_base(size_t idx) {
    return idx & ~block_mask;
    }
    static bool is_block_root(size_t idx) {
    return block_offset(idx) == 1;
    }
    static bool is_block_leaf(size_t idx) {
    return (idx & (block_size >> 1)) != 0U;
    }
    ...
    };
    1
    2 3
    4 5 6 7
    0

    View Slide

  182. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 182/205
    class timeout_store {
    static constexpr size_t block_size = 8;
    static constexpr size_t block_mask = block_size – 1U;
    static size_t block_offset(size_t idx) {
    return idx & block_mask;
    }
    static size_t block_base(size_t idx) {
    return idx & ~block_mask;
    }
    static bool is_block_root(size_t idx) {
    return block_offset(idx) == 1;
    }
    static bool is_block_leaf(size_t idx) {
    return (idx & (block_size >> 1)) != 0U;
    }
    ...
    };
    1
    2 3
    4 5 6 7
    0

    View Slide

  183. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 183/205
    class timeout_store {
    static constexpr size_t block_size = 8;
    static constexpr size_t block_mask = block_size – 1U;
    static size_t block_offset(size_t idx) {
    return idx & block_mask;
    }
    static size_t block_base(size_t idx) {
    return idx & ~block_mask;
    }
    static bool is_block_root(size_t idx) {
    return block_offset(idx) == 1;
    }
    static bool is_block_leaf(size_t idx) {
    return (idx & (block_size >> 1)) != 0U;
    }
    ...
    };
    1
    2 3
    4 5 6 7
    0

    View Slide

  184. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 184/205
    class timeout_store {
    static constexpr size_t block_size = 8;
    static constexpr size_t block_mask = block_size – 1U;
    static size_t block_offset(size_t idx);
    static size_t block_base(size_t idx);
    static bool is_block_root(size_t idx);
    static bool is_block_leaf(size_t idx);
    static size_t left_child_of(size_t idx) {
    if (!is_block_leaf(idx)) return idx + block_offset(idx);
    auto base = block_base(idx) + 1;
    return base * block_size + child_no(idx) * block_size * 2 + 1;
    }
    ...
    };

    View Slide

  185. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 185/205
    class timeout_store {
    static constexpr size_t block_size = 8;
    static constexpr size_t block_mask = block_size – 1U;
    static size_t block_offset(size_t idx);
    static size_t block_base(size_t idx);
    static bool is_block_root(size_t idx);
    static bool is_block_leaf(size_t idx);
    static size_t left_child_of(size_t idx) {
    if (!is_block_leaf(idx)) return idx + block_offset(idx);
    auto base = block_base(idx) + 1;
    return base * block_size + child_no(idx) * block_size * 2 + 1;
    }
    ...
    };

    View Slide

  186. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 186/205
    class timeout_store {
    static constexpr size_t block_size = 8;
    static constexpr size_t block_mask = block_size – 1U;
    static size_t block_offset(size_t idx);
    static size_t block_base(size_t idx);
    static bool is_block_root(size_t idx);
    static bool is_block_leaf(size_t idx);
    static size_t left_child_of(size_t idx) {
    if (!is_block_leaf(idx)) return idx + block_offset(idx);
    auto base = block_base(idx) + 1;
    return base * block_size + child_no(idx) * block_size * 2 + 1;
    }
    ...
    };
    static size_t parent_of(size_t idx) {
    auto const node_root = block_base(idx);
    if (!is_block_root(idx)) return node_root + block_offset(idx) / 2;
    auto parent_base = block_base(node_root / block_size - 1);
    auto child = ((idx - block_size) / block_size - parent_base) / 2;
    return parent_base + block_size / 2 + child;
    }

    View Slide

  187. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 187/205
    class timeout_store {
    static constexpr size_t block_size = 8;
    static constexpr size_t block_mask = block_size – 1U;
    static size_t block_offset(size_t idx);
    static size_t block_base(size_t idx);
    static bool is_block_root(size_t idx);
    static bool is_block_leaf(size_t idx);
    static size_t left_child_of(size_t idx) {
    if (!is_block_leaf(idx)) return idx + block_offset(idx);
    auto base = block_base(idx) + 1;
    return base * block_size + child_no(idx) * block_size * 2 + 1;
    }
    ...
    };
    static size_t parent_of(size_t idx) {
    auto const node_root = block_base(idx);
    if (!is_block_root(idx)) return node_root + block_offset(idx) / 2;
    auto parent_base = block_base(node_root / block_size - 1);
    auto child = ((idx - block_size) / block_size - parent_base) / 2;
    return parent_base + block_size / 2 + child;
    }

    View Slide

  188. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 188/205
    class timeout_store {
    ...
    using allocator = align_allocator<64>::type;
    std::vector bheap_store;
    };

    View Slide

  189. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 189/205
    class timeout_store {
    ...
    using allocator = align_allocator<64>::type;
    std::vector bheap_store;
    };

    View Slide

  190. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 190/205
    class timeout_store {
    ...
    using allocator = align_allocator<64>::type;
    std::vector bheap_store;
    };
    template
    struct align_allocator {
    template
    struct type {
    using value_type = T;
    static constexpr std::align_val_t alignment{N};
    T* allocate(size_t n) {
    return static_cast(operator new(n*sizeof(T), alignment));
    }
    void deallocate(T* p, size_t) {
    operator delete(p, alignment);
    }
    };
    };

    View Slide

  191. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 191/205
    class timeout_store {
    ...
    using allocator = align_allocator<64>::type;
    std::vector bheap_store;
    };
    template
    struct align_allocator {
    template
    struct type {
    using value_type = T;
    static constexpr std::align_val_t alignment{N};
    T* allocate(size_t n) {
    return static_cast(operator new(n*sizeof(T), alignment));
    }
    void deallocate(T* p, size_t) {
    operator delete(p, alignment);
    }
    };
    };

    View Slide

  192. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 192/205
    class timeout_store {
    ...
    using allocator = align_allocator<64>::type;
    std::vector bheap_store;
    };
    template
    struct align_allocator {
    template
    struct type {
    using value_type = T;
    static constexpr std::align_val_t alignment{N};
    T* allocate(size_t n) {
    return static_cast(operator new(n*sizeof(T), alignment));
    }
    void deallocate(T* p, size_t) {
    operator delete(p, alignment);
    }
    };
    };
    Aligned operator new and
    delete came with C++ 17

    View Slide

  193. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 193/205
    Live demo

    View Slide

  194. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 194/205
    1 10 100 1000 10000 100000
    0
    0
    0
    0
    0
    0
    0.01
    0.1
    Execution time
    linear
    bsearch
    map
    heap
    bheap
    Number of elements
    seconds

    View Slide

  195. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 195/205
    1 10 100 1000 10000 100000
    0
    0
    0
    0
    0
    0
    0.01
    0.1
    Execution time
    linear
    bsearch
    map
    heap
    bheap
    Number of elements
    seconds
    1 10 100 1000 10000 100000
    0
    0.1
    0.2
    0.3
    0.4
    0.5
    0.6
    0.7
    0.8
    0.9
    Execution time relative map
    heap/map
    bheap/map
    Number of elements
    factor

    View Slide

  196. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 196/205
    Rules of thumb

    View Slide

  197. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 197/205
    Rules of thumb

    Following a pointer is a cache miss, unless you have information to the contrary

    View Slide

  198. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 198/205
    Rules of thumb

    Following a pointer is a cache miss, unless you have information to the contrary

    Smaller working data set is better

    View Slide

  199. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 199/205
    Rules of thumb

    Following a pointer is a cache miss, unless you have information to the contrary

    Smaller working data set is better

    Use as much of a cache entry as you can

    View Slide

  200. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 200/205
    Rules of thumb

    Following a pointer is a cache miss, unless you have information to the contrary

    Smaller working data set is better

    Use as much of a cache entry as you can

    Sequential memory accesses can be very fast due to prefetching

    View Slide

  201. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 201/205
    Rules of thumb

    Following a pointer is a cache miss, unless you have information to the contrary

    Smaller working data set is better

    Use as much of a cache entry as you can

    Sequential memory accesses can be very fast due to prefetching

    Fewer evicted cache lines means more data in hot cache for the rest of the program

    View Slide

  202. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 202/205
    Rules of thumb

    Following a pointer is a cache miss, unless you have information to the contrary

    Smaller working data set is better

    Use as much of a cache entry as you can

    Sequential memory accesses can be very fast due to prefetching

    Fewer evicted cache lines means more data in hot cache for the rest of the program

    Mispredicted branches can evict cache entries

    View Slide

  203. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 203/205
    Rules of thumb

    Following a pointer is a cache miss, unless you have information to the contrary

    Smaller working data set is better

    Use as much of a cache entry as you can

    Sequential memory accesses can be very fast due to prefetching

    Fewer evicted cache lines means more data in hot cache for the rest of the program

    Mispredicted branches can evict cache entries

    Measure measure measure

    View Slide

  204. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 204/205
    Resources
    Ulrich Drepper - “What every programmer should know about memory”
    http://www.akkadia.org/drepper/cpumemory.pdf
    Milian Wolff - “Linux perf for Qt Developers”
    https://www.youtube.com/watch?v=L4NClVxqdMw
    Travis Downs - “Cache counters rant”
    https://gist.github.com/travisdowns/90a588deaaa1b93559fe2b8510f2a739
    Emery Berger - “Performance Matters”
    https://www.youtube.com/watch?v=r-TLSBdHe1A

    View Slide

  205. What Do You Mean by “Cache Friendly”? – code::dive 2019 © Björn Fahller @bjorn_fahller 205/205
    [email protected]
    @bjorn_fahller
    @rollbear
    Björn Fahller
    What Do You Mean by “Cache Friendly”?

    View Slide