$30 off During Our Annual Pro Sale. View Details »

C++OnSea - What Do You Mean by "Cache Friendly"?

C++OnSea - What Do You Mean by "Cache Friendly"?

Data structures, and sometimes the algorithms that operate on them, can be described as "cache friendly" or "cache hostile", but what is meant by that, and does it really matter?

Cache memory in modern CPUs can be a hundred times faster than main memory, but caches are very small and have some interesting properties, that some times can be counter-intuitive. Getting good performance requires thinking about how your data structures are laid out in memory, and how they are accessed.

This presentation will explain why some constructions are problematic and show better alternatives. I will show tools for analyzing cache efficiency, and things to think about when making changes to gain performance. You will develop an intuition for writing fast software by default, and learn techniques to improve it.

Björn Fahller

July 06, 2022
Tweet

More Decks by Björn Fahller

Other Decks in Programming

Transcript

  1. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 1/205
  2. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 2/205 What Do You Mean by “Cache Friendly”? Björn Fahller
  3. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 3/205 typedef uint32_t (*timer_cb)(void*); struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; }; static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts }; timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer* iter = timeouts.prev; while (iter != &timeouts && is_after(iter->deadline, deadline)) iter = iter→prev; add_behind(iter, deadline, cb, userp); }
  4. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 4/205 typedef uint32_t (*timer_cb)(void*); struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; }; static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts }; timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer* iter = timeouts.prev; while (iter != &timeouts && is_after(iter->deadline, deadline)) iter = iter→prev; add_behind(iter, deadline, cb, userp); }
  5. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 5/205 typedef uint32_t (*timer_cb)(void*); struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; }; static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts }; timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer* iter = timeouts.prev; while (iter != &timeouts && is_after(iter->deadline, deadline)) iter = iter→prev; add_behind(iter, deadline, cb, userp); }
  6. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 6/205 typedef uint32_t (*timer_cb)(void*); struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; }; static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts }; timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer* iter = timeouts.prev; while (iter != &timeouts && is_after(iter->deadline, deadline)) iter = iter→prev; add_behind(iter, deadline, cb, userp); } void cancel_timer(timer* t) { t->next->prev = t->prev; t->prev->next = t->next; free(t); }
  7. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 7/205 What Do You Mean by “Cache Friendly”? Björn Fahller
  8. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 8/205 Simplistic model of cache behaviour Includes
  9. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 9/205 Simplistic model of cache behaviour Includes • The cache is small
  10. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 10/205 Simplistic model of cache behaviour Includes • The cache is small • and consists of fixed size lines
  11. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 11/205 Simplistic model of cache behaviour Includes • The cache is small • and consists of fixed size lines • and is very very fast when hit
  12. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 12/205 Simplistic model of cache behaviour Includes • The cache is small • and consists of fixed size lines • and is very very fast when hit • and missing is very slow
  13. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 13/205 Simplistic model of cache behaviour Includes • The cache is small • and consists of fixed size lines • and is very very fast when hit • and missing is very slow Excludes
  14. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 14/205 Simplistic model of cache behaviour Includes • The cache is small • and consists of fixed size lines • and is very very fast when hit • and missing is very slow Excludes • Multiple levels of caches
  15. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 15/205 Simplistic model of cache behaviour Includes • The cache is small • and consists of fixed size lines • and is very very fast when hit • and missing is very slow Excludes • Multiple levels of caches • Associativity
  16. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 16/205 Simplistic model of cache behaviour Includes • The cache is small • and consists of fixed size lines • and is very very fast when hit • and missing is very slow Excludes • Multiple levels of caches • Associativity • Threading
  17. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 17/205 Simplistic model of cache behaviour Includes • The cache is small • and consists of fixed size lines • and is very very fast when hit • and missing is very slow Excludes • Multiple levels of caches • Associativity • Threading All models are wrong, but some are useful
  18. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 18/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour
  19. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 19/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour
  20. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 20/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour
  21. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 21/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour
  22. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 22/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour
  23. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 23/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour
  24. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 24/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour
  25. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 25/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 Simplistic model of cache behaviour
  26. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 26/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 Simplistic model of cache behaviour
  27. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 27/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 Simplistic model of cache behaviour
  28. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 28/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 Simplistic model of cache behaviour
  29. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 29/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4010 Simplistic model of cache behaviour
  30. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 30/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4010 Simplistic model of cache behaviour
  31. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 31/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 Simplistic model of cache behaviour
  32. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 32/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 Simplistic model of cache behaviour
  33. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 33/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 Simplistic model of cache behaviour
  34. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 34/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 Simplistic model of cache behaviour
  35. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 35/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 Simplistic model of cache behaviour
  36. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 36/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 Simplistic model of cache behaviour
  37. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 37/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 0x4080 Simplistic model of cache behaviour
  38. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 38/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 0x4080 Simplistic model of cache behaviour
  39. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 39/205 Analysis of implementation int main() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<uint32_t> dist; for (int k = 0; k < 10; ++k) { timer* prev = nullptr; for (int i = 0; i < 20'000; ++i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }
  40. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 40/205 Analysis of implementation int main() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<uint32_t> dist; for (int k = 0; k < 10; ++k) { timer* prev = nullptr; for (int i = 0; i < 20'000; ++i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }
  41. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 41/205 Analysis of implementation int main() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<uint32_t> dist; for (int k = 0; k < 10; ++k) { timer* prev = nullptr; for (int i = 0; i < 20'000; ++i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }
  42. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 42/205 Analysis of implementation int main() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<uint32_t> dist; for (int k = 0; k < 10; ++k) { timer* prev = nullptr; for (int i = 0; i < 20'000; ++i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }
  43. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 43/205 Analysis of implementation int main() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<uint32_t> dist; for (int k = 0; k < 10; ++k) { timer* prev = nullptr; for (int i = 0; i < 20'000; ++i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } } bool shoot_first() { if (timeouts.next == &timeouts) return false; timer* t = timeouts.next; t->callback(t->userp); cancel_timer(t); return true; }
  44. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 44/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes
  45. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 45/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes Essentially a profiler that collects info about call hierarchies, number of calls, and time spent. The CPU simulator is not cycle accurate, so see timing results as a broad picture.
  46. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 46/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes Essentially a profiler that collects info about call hierarchies, number of calls, and time spent. The CPU simulator is not cycle accurate, so see timing results as a broad picture. Simulates a CPU cache, flattened to 2 levels, L1 and LL. It shows you where you get cache misses. L1 is by default a model of your host CPU L1, but you can change size, line-size, and associativity.
  47. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 47/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes Essentially a profiler that collects info about call hierarchies, number of calls, and time spent. The CPU simulator is not cycle accurate, so see timing results as a broad picture. Simulates a CPU cache, flattened to 2 levels, L1 and LL. It shows you where you get cache misses. L1 is by default a model of your host CPU L1, but you can change size, line-size, and associativity. Collects statistics per instruction instead of per source line. Can help pinpointing bottlenecks.
  48. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 48/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes Essentially a profiler that collects info about call hierarchies, number of calls, and time spent. The CPU simulator is not cycle accurate, so see timing results as a broad picture. Simulates a CPU cache, flattened to 2 levels, L1 and LL. It shows you where you get cache misses. L1 is by default a model of your host CPU L1, but you can change size, line-size, and associativity. Collects statistics per instruction instead of per source line. Can help pinpointing bottlenecks. Simulates a branch predictor.
  49. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 49/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes Essentially a profiler that collects info about call hierarchies, number of calls, and time spent. The CPU simulator is not cycle accurate, so see timing results as a broad picture. Simulates a CPU cache, flattened to 2 levels, L1 and LL. It shows you where you get cache misses. L1 is by default a model of your host CPU L1, but you can change size, line-size, and associativity. Collects statistics per instruction instead of per source line. Can help pinpointing bottlenecks. Simulates a branch predictor. Very slow!
  50. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 50/205 Live demo
  51. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 51/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer;
  52. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 52/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes
  53. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 53/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment
  54. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 54/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes
  55. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 55/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes
  56. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 56/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes
  57. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 57/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes // 8 bytes
  58. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 58/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes // 8 bytes // sum = 40 bytes
  59. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 59/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes // 8 bytes // sum = 40 bytes 66% of all L1d cache misses
  60. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 60/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes // 8 bytes // sum = 40 bytes 66% of all L1d cache misses Rule of thumb: Follow pointer => cache miss
  61. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 61/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes // 8 bytes // sum = 40 bytes 66% of all L1d cache misses Rule of thumb: Follow pointer => cache miss 33% of all L1d cache misses
  62. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 62/205 Chasing pointers is expensive. Let’s get rid of the pointers.
  63. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 63/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector<timer_data> timeouts; uint32_t next_id = 0;
  64. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 64/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; 24 bytes per entry. No pointer chasing
  65. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 65/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; 24 bytes per entry. No pointer chasing Linear structure
  66. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 66/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std::move(timeouts[idx-1]); --idx; } timeouts[idx] = timer_data{deadline, next_id++, userp, cb }; return next_id; }
  67. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 67/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std::move(timeouts[idx-1]); --idx; } timeouts[idx] = timer_data{deadline, next_id++, userp, cb }; return next_id; } Linear insertion sort
  68. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 68/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std::move(timeouts[idx-1]); --idx; } timeouts[idx] = timer_data{deadline, next_id++, userp, cb }; return next_id; }
  69. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 69/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std::move(timeouts[idx-1]); --idx; } timeouts[idx] = timer_data{deadline, next_id++, userp, cb }; return next_id; } void cancel_timer(timer t) { auto i = std::find_if(timeouts.begin(), timeouts.end(), [t](const auto& e) { return e.id == t; }); timeouts.erase(i); }
  70. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 70/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std::move(timeouts[idx-1]); --idx; } timeouts[idx] = timer_data{deadline, next_id++, userp, cb }; return next_id; } void cancel_timer(timer t) { auto i = std::find_if(timeouts.begin(), timeouts.end(), [t](const auto& e) { return e.id == t; }); timeouts.erase(i); } Linear search
  71. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 71/205 Analysis of implementation perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses
  72. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 72/205 Analysis of implementation perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses Presents statistics from whole run of program, using counters from HW and linux kernel.
  73. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 73/205 Analysis of implementation perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses Presents statistics from whole run of program, using counters from HW and linux kernel. Number of cycles per instruction is a proxy for how much the CPU is working or waiting.
  74. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 74/205 Analysis of implementation perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses Presents statistics from whole run of program, using counters from HW and linux kernel. Number of cycles per instruction is a proxy for how much the CPU is working or waiting. Number of reads from L1d cache, and number of misses. Speculative execution can make these numbers confusing.
  75. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 75/205 Analysis of implementation perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses Presents statistics from whole run of program, using counters from HW and linux kernel. Number of cycles per instruction is a proxy for how much the CPU is working or waiting. Number of reads from L1d cache, and number of misses. Speculative execution can make these numbers confusing. Very fast!
  76. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 76/205 Analysis of implementation perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses --call-graph=dwarf
  77. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 77/205 Analysis of implementation perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses --call-graph=dwarf https://developer.amd.com/amd-uprof/ µProf https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html vtune
  78. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 78/205 Analysis of implementation perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses --call-graph=dwarf Records where in your program the counters are gathered.
  79. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 79/205 Analysis of implementation perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses --call-graph=dwarf Records where in your program the counters are gathered. Records call graph info, instead of just location. dwarf requires debug info.
  80. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 80/205 Analysis of implementation perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses --call-graph=dwarf Records where in your program the counters are gathered. Records call graph info, instead of just location. dwarf requires debug info. Very fast!
  81. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 81/205 Live demo
  82. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 82/205
  83. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 83/205 Linear search is expensive. Maybe try binary search?
  84. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 84/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector<timer_data> timeouts; uint32_t next_id = 0;
  85. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 85/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; }
  86. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 86/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Binary search for insertion point
  87. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 87/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Linear insertion
  88. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 88/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Linear insertion void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std::find_if(lo, hi, [t](const auto& e) { return e.id == t.id; }); if (i != hi) { timeouts.erase(i); } }
  89. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 89/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Linear insertion void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std::find_if(lo, hi, [t](const auto& e) { return e.id == t.id; }); if (i != hi) { timeouts.erase(i); } } Binary search for timers with the same deadline
  90. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 90/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Linear insertion void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std::find_if(lo, hi, [t](const auto& e) { return e.id == t.id; }); if (i != hi) { timeouts.erase(i); } } Linear search for matching id
  91. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 91/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector<timer_data> timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Linear insertion void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std::find_if(lo, hi, [t](const auto& e) { return e.id == t.id; }); if (i != hi) { timeouts.erase(i); } } Linear removal
  92. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 92/205 Live demo
  93. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 93/205 Searches not visible in profiling. Number of reads reduced. Number of cache misses high. memmove() dominates.
  94. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 94/205 Searches not visible in profiling. Number of reads reduced. Number of cache misses high. memmove() dominates. 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 1.00E-06 10.00E-06 linear_array bsearch_array elements seconds per element
  95. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 95/205 Failed branch predictions can lead to cache entry eviction on some CPUs (spectre/meltdown) Searches not visible in profiling. Number of reads reduced. Number of cache misses high. memmove() dominates.
  96. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 96/205 Failed branch predictions can lead to cache entry eviction on some CPUs (spectre/meltdown) Searches not visible in profiling. Number of reads reduced. Number of cache misses high. memmove() dominates. Maybe try a map<>?
  97. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 97/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { void* userp; timer_cb callback; }; struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; } }; using timer_map = std::multimap<uint32_t, timer_data, is_after>; using timer = timer_map::iterator; static timer_map timeouts;
  98. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 98/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { void* userp; timer_cb callback; }; struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; } }; using timer_map = std::multimap<uint32_t, timer_data, is_after>; using timer = timer_map::iterator; static timer_map timeouts; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std::make_pair(deadline, timer_data{userp, cb})); } void cancel_timer(timer t) { timeouts.erase(t); }
  99. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 99/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { void* userp; timer_cb callback; }; struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; } }; using timer_map = std::multimap<uint32_t, timer_data, is_after>; using timer = timer_map::iterator; static timer_map timeouts; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std::make_pair(deadline, timer_data{userp, cb})); } void cancel_timer(timer t) { timeouts.erase(t); } bool shoot_first() { if (timeouts.empty()) return false; auto i = timeouts.begin(); i->second.callback(i->second.userp); timeouts.erase(i); return true; }
  100. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 100/205 Live demo
  101. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 101/205
  102. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 102/205 Faster, but lots of cache misses when comparing keys and rebalancing the tree.
  103. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 103/205 Faster, but lots of cache misses when comparing keys and rebalancing the tree. What did I say about chasing pointers?
  104. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 104/205 Faster, but lots of cache misses when comparing keys and rebalancing the tree. What did I say about chasing pointers? 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 1.00E-06 10.00E-06 bsearch_array map elements seconds per element
  105. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 105/205 Faster, but lots of cache misses when comparing keys and rebalancing the tree. What did I say about chasing pointers? Can we get log(n) lookup and insertion without chasing pointers?
  106. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 106/205 Enter the HEAP
  107. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 107/205 3 5 8 6 10 10 14 9 15 13 12 11 Enter the HEAP
  108. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 108/205 3 5 8 6 10 10 14 9 15 13 12 11 Enter the HEAP • Perfectly balanced partially sorted tree
  109. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 109/205 3 5 8 6 10 10 14 9 15 13 12 11 Enter the HEAP • Perfectly balanced partially sorted tree • Every node is sorted after or same as its parent
  110. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 110/205 3 5 8 6 10 10 14 9 15 13 12 11 Enter the HEAP • Perfectly balanced partially sorted tree • Every node is sorted after or same as its parent • No relation between siblings
  111. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 111/205 3 5 8 6 10 10 14 9 15 13 12 11 Enter the HEAP • Perfectly balanced partially sorted tree • Every node is sorted after or same as its parent • No relation between siblings • At most one node with only one child, and that child is the last node
  112. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 112/205 10 14 15 13 12 11 8 3 5 6 9
  113. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 113/205 10 14 15 13 12 11 8 Insertion: 3 5 6 9
  114. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 114/205 10 14 15 13 12 11 8 Insertion: • Create space 3 5 6 9
  115. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 115/205 10 14 15 13 12 11 8 Insertion: • Create space • Trickle down greater nodes 3 5 6 9
  116. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 116/205 10 14 15 13 12 11 8 Insertion: • Create space • Trickle down greater nodes • Insert into space 3 5 6 9
  117. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 117/205 10 14 15 13 12 11 8 7 Insertion: • Create space • Trickle down greater nodes • Insert into space 3 5 6 9
  118. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 118/205 10 10 14 15 13 12 11 8 7 Insertion: • Create space • Trickle down greater nodes • Insert into space 3 5 6 9
  119. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 119/205 10 10 14 15 13 12 11 8 7 Insertion: • Create space • Trickle down greater nodes • Insert into space 3 5 6 9
  120. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 120/205 10 10 14 15 13 12 11 8 7 Insertion: • Create space • Trickle down greater nodes • Insert into space 3 5 6 9
  121. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 121/205 7 8 10 14 15 13 12 11 3 5 6 9 10
  122. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 122/205 7 8 10 14 15 13 12 11 Pop top: 3 5 6 9 10
  123. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 123/205 7 8 10 14 15 13 12 11 Pop top: • Remove top 3 5 6 9 10
  124. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 124/205 7 8 10 14 15 13 12 11 Pop top: • Remove top • Trickle up lesser child 3 5 6 9 10
  125. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 125/205 7 8 10 14 15 13 12 11 Pop top: • Remove top • Trickle up lesser child • move-insert last into hole 3 5 6 9 10
  126. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 126/205 7 8 10 14 15 13 12 11 Pop top: • Remove top • Trickle up lesser child • move-insert last into hole 5 6 9 10
  127. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 127/205 7 8 10 14 15 13 12 11 Pop top: • Remove top • Trickle up lesser child • move-insert last into hole 5 6 9 10
  128. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 128/205 7 8 10 14 15 13 12 11 Pop top: • Remove top • Trickle up lesser child • move-insert last into hole 5 6 9 10
  129. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 129/205 7 8 10 14 15 13 12 11 Pop top: • Remove top • Trickle up lesser child • move-insert last into hole 5 6 9 10
  130. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 130/205 7 8 10 14 15 13 12 11 Pop top: • Remove top • Trickle up lesser child • move-insert last into hole 5 6 9 10
  131. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 131/205 7 8 10 14 15 13 12 11 Pop top: • Remove top • Trickle up lesser child • move-insert last into hole 5 6 9 10
  132. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 132/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 10 8 12 9 13 10 11 11 15 12 15 15 15 15
  133. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 133/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 10 8 12 9 13 10 11 11 15 12 15 15 15 15 Addressing: The index of a parent node is half (rounded down) of that of a child.
  134. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 134/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 10 8 12 9 13 10 11 11 15 12 15 15 15 15 Addressing: The index of a parent node is half (rounded down) of that of a child.
  135. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 135/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 10 8 12 9 13 10 11 11 15 12 15 15 15 15 Addressing: The index of a parent node is half (rounded down) of that of a child. Array indexes! No pointer chasing!
  136. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 136/205 The heap is not searchable, so how handle cancellation?
  137. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 137/205 The heap is not searchable, so how handle cancellation? struct timer_action { uint32_t (*callback)(void*); void* userp; };
  138. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 138/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; };
  139. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 139/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; };
  140. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 140/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; }; struct timeout { uint32_t deadline; uint32_t action_index; };
  141. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 141/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; }; struct timeout { uint32_t deadline; uint32_t action_index; };
  142. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 142/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; }; struct timeout { uint32_t deadline; uint32_t action_index; }; Only 8 bytes per element of working data in the heap.
  143. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 143/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; }; struct timeout { uint32_t deadline; uint32_t action_index; }; Cancel by setting callback to nullptr Only 8 bytes per element of working data in the heap.
  144. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 144/205 struct timer_data { uint32_t deadline; uint32_t action_index; }; struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; } }; std::priority_queue<timer_data, std::vector<timer_data>, is_after> timeouts; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index; }
  145. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 145/205 struct timer_data { uint32_t deadline; uint32_t action_index; }; struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; } }; std::priority_queue<timer_data, std::vector<timer_data>, is_after> timeouts; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index; } Container adapter that implements a heap
  146. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 146/205 struct timer_data { uint32_t deadline; uint32_t action_index; }; struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; } }; std::priority_queue<timer_data, std::vector<timer_data>, is_after> timeouts; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index; } Container adapter that implements a heap
  147. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 147/205 bool shoot_first() { while (!timeouts.empty()) { auto& t = timeouts.top(); auto& action = actions[t.action_index]; if (action.callback) break; actions.remove(t.action_index); timeouts.pop(); } if (timeouts.empty()) return false; auto& t = timeouts.top(); auto& action = actions[t.action_index]; action.callback(action.userp); actions.remove(t.action_index); timeouts.pop(); return true; } Pop-off any cancelled items
  148. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 148/205 Live demo
  149. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 149/205 A lot fewer everything! and about twice as fast too
  150. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 150/205 A lot fewer everything! and about twice as fast too 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 map heap_aux elements seconds per element
  151. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 151/205 A lot fewer everything! and about twice as fast too But there are many cache misses in the adjust-heap functions
  152. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 152/205 A lot fewer everything! and about twice as fast too But there are many cache misses in the adjust-heap functions Can we do better?
  153. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 153/205 How do the entries fit in cache lines?
  154. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 154/205 How do the entries fit in cache lines?
  155. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 155/205 How do the entries fit in cache lines?
  156. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 156/205 How do the entries fit in cache lines?
  157. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 157/205 How do the entries fit in cache lines?
  158. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 158/205 How do the entries fit in cache lines?
  159. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 159/205 How do the entries fit in cache lines?
  160. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 160/205 Every generation is on a new cache line
  161. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 161/205 Can we do better?
  162. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 162/205 Can we do better?
  163. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 163/205 Can we do better?
  164. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 164/205 Can we do better?
  165. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 165/205 Can we do better?
  166. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 166/205 Can we do better?
  167. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 167/205 Can we do better?
  168. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 168/205 Three generations per cache line!
  169. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 169/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7
  170. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 170/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7
  171. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 171/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 0
  172. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 172/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 0 8 9 10 11 12 13 14 15
  173. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 173/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 0 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  174. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 174/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 0 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
  175. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 175/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) == 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size >> 1)) != 0U; } ... }; 1 2 3 4 5 6 7 0
  176. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 176/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) == 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size >> 1)) != 0U; } ... }; 1 2 3 4 5 6 7 0
  177. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 177/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) == 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size >> 1)) != 0U; } ... }; 1 2 3 4 5 6 7 0
  178. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 178/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) == 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size >> 1)) != 0U; } ... }; 1 2 3 4 5 6 7 0
  179. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 179/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) == 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size >> 1)) != 0U; } ... }; 1 2 3 4 5 6 7 0
  180. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 180/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx); static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; } ... };
  181. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 181/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx); static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; } ... };
  182. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 182/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx); static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; } ... }; static size_t parent_of(size_t idx) { auto const node_root = block_base(idx); if (!is_block_root(idx)) return node_root + block_offset(idx) / 2; auto parent_base = block_base(node_root / block_size - 1); auto child = ((idx - block_size) / block_size - parent_base) / 2; return parent_base + block_size / 2 + child; }
  183. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 183/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx); static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; } ... }; static size_t parent_of(size_t idx) { auto const node_root = block_base(idx); if (!is_block_root(idx)) return node_root + block_offset(idx) / 2; auto parent_base = block_base(node_root / block_size - 1); auto child = ((idx - block_size) / block_size - parent_base) / 2; return parent_base + block_size / 2 + child; }
  184. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 184/205 class timeout_store { ... using allocator = align_allocator<64>::type<timer_data>; std::vector<timer_data, allocator> bheap_store; };
  185. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 185/205 class timeout_store { ... using allocator = align_allocator<64>::type<timer_data>; std::vector<timer_data, allocator> bheap_store; };
  186. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 186/205 class timeout_store { ... using allocator = align_allocator<64>::type<timer_data>; std::vector<timer_data, allocator> bheap_store; }; template <size_t N> struct align_allocator { template <typename T> struct type { using value_type = T; static constexpr std::align_val_t alignment{N}; T* allocate(size_t n) { return static_cast<T*>(operator new(n*sizeof(T), alignment)); } void deallocate(T* p, size_t) { operator delete(p, alignment); } }; };
  187. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 187/205 class timeout_store { ... using allocator = align_allocator<64>::type<timer_data>; std::vector<timer_data, allocator> bheap_store; }; template <size_t N> struct align_allocator { template <typename T> struct type { using value_type = T; static constexpr std::align_val_t alignment{N}; T* allocate(size_t n) { return static_cast<T*>(operator new(n*sizeof(T), alignment)); } void deallocate(T* p, size_t) { operator delete(p, alignment); } }; };
  188. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 188/205 class timeout_store { ... using allocator = align_allocator<64>::type<timer_data>; std::vector<timer_data, allocator> bheap_store; }; template <size_t N> struct align_allocator { template <typename T> struct type { using value_type = T; static constexpr std::align_val_t alignment{N}; T* allocate(size_t n) { return static_cast<T*>(operator new(n*sizeof(T), alignment)); } void deallocate(T* p, size_t) { operator delete(p, alignment); } }; }; Aligned operator new and delete came with C++ 17
  189. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 189/205 Live demo
  190. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 190/205 Many more instructions and branches
  191. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 191/205 Many more instructions and branches But fewer cache accesses and cache misses and branch mispredictions
  192. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 192/205 Many more instructions and branches But fewer cache accesses and cache misses and branch mispredictions and (maybe) faster?
  193. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 193/205 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 heap_aux bheap_aux elements seconds per element Many more instructions and branches But fewer cache accesses and cache misses and branch mispredictions and (maybe) faster?
  194. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 194/205 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 heap_aux bheap_aux elements seconds per element Many more instructions and branches But fewer cache accesses and cache misses and branch mispredictions and (maybe) faster? 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 1.00E-06 10.00E-06 linear_array bheap_aux elements seconds per element
  195. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 195/205 Rules of thumb
  196. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 196/205 Rules of thumb • Following a pointer is a cache miss, unless you have information to the contrary
  197. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 197/205 Rules of thumb • Following a pointer is a cache miss, unless you have information to the contrary • Smaller working data set is better
  198. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 198/205 Rules of thumb • Following a pointer is a cache miss, unless you have information to the contrary • Smaller working data set is better • Use as much of a cache entry as you can
  199. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 199/205 Rules of thumb • Following a pointer is a cache miss, unless you have information to the contrary • Smaller working data set is better • Use as much of a cache entry as you can • Sequential memory accesses can be very fast due to prefetching
  200. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 200/205 Rules of thumb • Following a pointer is a cache miss, unless you have information to the contrary • Smaller working data set is better • Use as much of a cache entry as you can • Sequential memory accesses can be very fast due to prefetching • Fewer evicted cache lines means more data in hot cache for the rest of the program
  201. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 201/205 Rules of thumb • Following a pointer is a cache miss, unless you have information to the contrary • Smaller working data set is better • Use as much of a cache entry as you can • Sequential memory accesses can be very fast due to prefetching • Fewer evicted cache lines means more data in hot cache for the rest of the program • Mispredicted branches can evict cache entries (spectre/meltdown)
  202. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 202/205 Rules of thumb • Following a pointer is a cache miss, unless you have information to the contrary • Smaller working data set is better • Use as much of a cache entry as you can • Sequential memory accesses can be very fast due to prefetching • Fewer evicted cache lines means more data in hot cache for the rest of the program • Mispredicted branches can evict cache entries (spectre/meltdown) • Linear access in contiguous memory rules for small data sets!
  203. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 203/205 Rules of thumb • Following a pointer is a cache miss, unless you have information to the contrary • Smaller working data set is better • Use as much of a cache entry as you can • Sequential memory accesses can be very fast due to prefetching • Fewer evicted cache lines means more data in hot cache for the rest of the program • Mispredicted branches can evict cache entries (spectre/meltdown) • Linear access in contiguous memory rules for small data sets! • Measure measure measure
  204. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 204/205 Resources Ulrich Drepper - “What every programmer should know about memory” http://www.akkadia.org/drepper/cpumemory.pdf Milian Wolff - “Linux perf for Qt Developers” https://www.youtube.com/watch?v=L4NClVxqdMw Travis Downs - “Cache counters rant” https://tinyurl.com/cache-counters-rant Emery Berger - “Performance Matters” https://www.youtube.com/watch?v=r-TLSBdHe1A
  205. What Do You Mean by “Cache Friendly”? – C++OnSea 2022

    © Björn Fahller @bjorn_fahller 205/205 bjorn@fahller.se @bjorn_fahller @rollbear Björn Fahller What Do You Mean by “Cache Friendly”?