C++OnSea - What Do You Mean by "Cache Friendly"?

Slide 1

Slide 1 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 1/205

Slide 2

Slide 2 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 2/205 What Do You Mean by “Cache Friendly”? Björn Fahller

Slide 3

Slide 3 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 3/205 typedef uint32_t (*timer_cb)(void*); struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; }; static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts }; timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer* iter = timeouts.prev; while (iter != &timeouts && is_after(iter->deadline, deadline)) iter = iter→prev; add_behind(iter, deadline, cb, userp); }

Slide 4

Slide 4 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 4/205 typedef uint32_t (*timer_cb)(void*); struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; }; static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts }; timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer* iter = timeouts.prev; while (iter != &timeouts && is_after(iter->deadline, deadline)) iter = iter→prev; add_behind(iter, deadline, cb, userp); }

Slide 5

Slide 5 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 5/205 typedef uint32_t (*timer_cb)(void*); struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; }; static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts }; timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer* iter = timeouts.prev; while (iter != &timeouts && is_after(iter->deadline, deadline)) iter = iter→prev; add_behind(iter, deadline, cb, userp); }

Slide 6

Slide 6 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 6/205 typedef uint32_t (*timer_cb)(void*); struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; }; static timer timeouts = { 0, NULL, NULL, &timeouts, &timeouts }; timer* schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer* iter = timeouts.prev; while (iter != &timeouts && is_after(iter->deadline, deadline)) iter = iter→prev; add_behind(iter, deadline, cb, userp); } void cancel_timer(timer* t) { t->next->prev = t->prev; t->prev->next = t->next; free(t); }

Slide 7

Slide 7 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 7/205 What Do You Mean by “Cache Friendly”? Björn Fahller

Slide 8

Slide 8 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 8/205 Simplistic model of cache behaviour Includes

Slide 9

Slide 9 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 9/205 Simplistic model of cache behaviour Includes ● The cache is small

Slide 10

Slide 10 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 10/205 Simplistic model of cache behaviour Includes ● The cache is small ● and consists of fixed size lines

Slide 11

Slide 11 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 11/205 Simplistic model of cache behaviour Includes ● The cache is small ● and consists of fixed size lines ● and is very very fast when hit

Slide 12

Slide 12 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 12/205 Simplistic model of cache behaviour Includes ● The cache is small ● and consists of fixed size lines ● and is very very fast when hit ● and missing is very slow

Slide 13

Slide 13 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 13/205 Simplistic model of cache behaviour Includes ● The cache is small ● and consists of fixed size lines ● and is very very fast when hit ● and missing is very slow Excludes

Slide 14

Slide 14 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 14/205 Simplistic model of cache behaviour Includes ● The cache is small ● and consists of fixed size lines ● and is very very fast when hit ● and missing is very slow Excludes ● Multiple levels of caches

Slide 15

Slide 15 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 15/205 Simplistic model of cache behaviour Includes ● The cache is small ● and consists of fixed size lines ● and is very very fast when hit ● and missing is very slow Excludes ● Multiple levels of caches ● Associativity

Slide 16

Slide 16 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 16/205 Simplistic model of cache behaviour Includes ● The cache is small ● and consists of fixed size lines ● and is very very fast when hit ● and missing is very slow Excludes ● Multiple levels of caches ● Associativity ● Threading

Slide 17

Slide 17 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 17/205 Simplistic model of cache behaviour Includes ● The cache is small ● and consists of fixed size lines ● and is very very fast when hit ● and missing is very slow Excludes ● Multiple levels of caches ● Associativity ● Threading All models are wrong, but some are useful

Slide 18

Slide 18 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 18/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour

Slide 19

Slide 19 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 19/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour

Slide 20

Slide 20 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 20/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour

Slide 21

Slide 21 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 21/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour

Slide 22

Slide 22 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 22/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour

Slide 23

Slide 23 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 23/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x3A10 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour

Slide 24

Slide 24 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 24/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory Simplistic model of cache behaviour

Slide 25

Slide 25 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 25/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 Simplistic model of cache behaviour

Slide 26

Slide 26 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 26/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 Simplistic model of cache behaviour

Slide 27

Slide 27 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 27/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 Simplistic model of cache behaviour

Slide 28

Slide 28 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 28/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 Simplistic model of cache behaviour

Slide 29

Slide 29 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 29/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4010 Simplistic model of cache behaviour

Slide 30

Slide 30 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 30/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4010 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4010 Simplistic model of cache behaviour

Slide 31

Slide 31 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 31/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 Simplistic model of cache behaviour

Slide 32

Slide 32 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 32/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 Simplistic model of cache behaviour

Slide 33

Slide 33 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 33/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 Simplistic model of cache behaviour

Slide 34

Slide 34 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 34/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 Simplistic model of cache behaviour

Slide 35

Slide 35 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 35/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 Simplistic model of cache behaviour

Slide 36

Slide 36 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 36/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 Simplistic model of cache behaviour

Slide 37

Slide 37 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 37/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 0x4080 Simplistic model of cache behaviour

Slide 38

Slide 38 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 38/205 const int* hot = 0x4004; const int* cold = 0x4048; int* also_cold = 0x4080; int a = *hot; int c = *cold; *also_cold = a; also_cold[1] = c; 0x4000 0x4FF0 cache 0x4000 0x4010 0x4020 0x4030 0x4040 0x4050 0x4060 0x4070 0x4080 0x4090 0x40A0 0x40B0 0x40C0 0x40D0 0x40E0 0x40F0 memory 0x4040 0x4080 0x4080 Simplistic model of cache behaviour

Slide 39

Slide 39 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 39/205 Analysis of implementation int main() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution dist; for (int k = 0; k < 10; ++k) { timer* prev = nullptr; for (int i = 0; i < 20'000; ++i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }

Slide 40

Slide 40 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 40/205 Analysis of implementation int main() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution dist; for (int k = 0; k < 10; ++k) { timer* prev = nullptr; for (int i = 0; i < 20'000; ++i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }

Slide 41

Slide 41 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 41/205 Analysis of implementation int main() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution dist; for (int k = 0; k < 10; ++k) { timer* prev = nullptr; for (int i = 0; i < 20'000; ++i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }

Slide 42

Slide 42 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 42/205 Analysis of implementation int main() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution dist; for (int k = 0; k < 10; ++k) { timer* prev = nullptr; for (int i = 0; i < 20'000; ++i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } }

Slide 43

Slide 43 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 43/205 Analysis of implementation int main() { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution dist; for (int k = 0; k < 10; ++k) { timer* prev = nullptr; for (int i = 0; i < 20'000; ++i) { timer* t = schedule_timer( dist(gen), [](void*){return 0U;}, nullptr); if (i & 1) cancel_timer(prev); prev = t; } while (shoot_first()) ; } } bool shoot_first() { if (timeouts.next == &timeouts) return false; timer* t = timeouts.next; t->callback(t->userp); cancel_timer(t); return true; }

Slide 44

Slide 44 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 44/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes

Slide 45

Slide 45 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 45/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes Essentially a profiler that collects info about call hierarchies, number of calls, and time spent. The CPU simulator is not cycle accurate, so see timing results as a broad picture.

Slide 46

Slide 46 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 46/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes Essentially a profiler that collects info about call hierarchies, number of calls, and time spent. The CPU simulator is not cycle accurate, so see timing results as a broad picture. Simulates a CPU cache, flattened to 2 levels, L1 and LL. It shows you where you get cache misses. L1 is by default a model of your host CPU L1, but you can change size, line-size, and associativity.

Slide 47

Slide 47 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 47/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes Essentially a profiler that collects info about call hierarchies, number of calls, and time spent. The CPU simulator is not cycle accurate, so see timing results as a broad picture. Simulates a CPU cache, flattened to 2 levels, L1 and LL. It shows you where you get cache misses. L1 is by default a model of your host CPU L1, but you can change size, line-size, and associativity. Collects statistics per instruction instead of per source line. Can help pinpointing bottlenecks.

Slide 48

Slide 48 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 48/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes Essentially a profiler that collects info about call hierarchies, number of calls, and time spent. The CPU simulator is not cycle accurate, so see timing results as a broad picture. Simulates a CPU cache, flattened to 2 levels, L1 and LL. It shows you where you get cache misses. L1 is by default a model of your host CPU L1, but you can change size, line-size, and associativity. Collects statistics per instruction instead of per source line. Can help pinpointing bottlenecks. Simulates a branch predictor.

Slide 49

Slide 49 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 49/205 Analysis of implementation valgrind --tool=callgrind –-cache-sim=yes –-dump-instr=yes --branch-sim=yes Essentially a profiler that collects info about call hierarchies, number of calls, and time spent. The CPU simulator is not cycle accurate, so see timing results as a broad picture. Simulates a CPU cache, flattened to 2 levels, L1 and LL. It shows you where you get cache misses. L1 is by default a model of your host CPU L1, but you can change size, line-size, and associativity. Collects statistics per instruction instead of per source line. Can help pinpointing bottlenecks. Simulates a branch predictor. Very slow!

Slide 50

Slide 50 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 50/205 Live demo

Slide 51

Slide 51 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 51/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer;

Slide 52

Slide 52 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 52/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes

Slide 53

Slide 53 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 53/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment

Slide 54

Slide 54 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 54/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes

Slide 55

Slide 55 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 55/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes

Slide 56

Slide 56 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 56/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes

Slide 57

Slide 57 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 57/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes // 8 bytes

Slide 58

Slide 58 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 58/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes // 8 bytes // sum = 40 bytes

Slide 59

Slide 59 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 59/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes // 8 bytes // sum = 40 bytes 66% of all L1d cache misses

Slide 60

Slide 60 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 60/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes // 8 bytes // sum = 40 bytes 66% of all L1d cache misses Rule of thumb: Follow pointer => cache miss

Slide 61

Slide 61 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 61/205 typedef uint32_t (*timer_cb)(void*); typedef struct timer { uint32_t deadline; timer_cb callback; void* userp; struct timer* next; struct timer* prev; } timer; // 4 bytes // 4 bytes padding for alignment // 8 bytes // 8 bytes // 8 bytes // 8 bytes // sum = 40 bytes 66% of all L1d cache misses Rule of thumb: Follow pointer => cache miss 33% of all L1d cache misses

Slide 62

Slide 62 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 62/205 Chasing pointers is expensive. Let’s get rid of the pointers.

Slide 63

Slide 63 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 63/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector timeouts; uint32_t next_id = 0;

Slide 64

Slide 64 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 64/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector timeouts; uint32_t next_id = 0; 24 bytes per entry. No pointer chasing

Slide 65

Slide 65 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 65/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector timeouts; uint32_t next_id = 0; 24 bytes per entry. No pointer chasing Linear structure

Slide 66

Slide 66 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 66/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std::move(timeouts[idx-1]); --idx; } timeouts[idx] = timer_data{deadline, next_id++, userp, cb }; return next_id; }

Slide 67

Slide 67 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 67/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std::move(timeouts[idx-1]); --idx; } timeouts[idx] = timer_data{deadline, next_id++, userp, cb }; return next_id; } Linear insertion sort

Slide 68

Slide 68 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 68/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std::move(timeouts[idx-1]); --idx; } timeouts[idx] = timer_data{deadline, next_id++, userp, cb }; return next_id; }

Slide 69

Slide 69 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 69/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std::move(timeouts[idx-1]); --idx; } timeouts[idx] = timer_data{deadline, next_id++, userp, cb }; return next_id; } void cancel_timer(timer t) { auto i = std::find_if(timeouts.begin(), timeouts.end(), [t](const auto& e) { return e.id == t; }); timeouts.erase(i); }

Slide 70

Slide 70 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 70/205 typedef uint32_t (*timer_cb)(void*); typedef uint32_t timer; struct timer_data { uint32_t deadline; timer id; void* userp; timer_cb callback; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto idx = timeouts.size(); timeouts.push_back({}); while (idx > 0 && is_after(timeouts[idx-1].deadline, deadline)) { timeouts[idx] = std::move(timeouts[idx-1]); --idx; } timeouts[idx] = timer_data{deadline, next_id++, userp, cb }; return next_id; } void cancel_timer(timer t) { auto i = std::find_if(timeouts.begin(), timeouts.end(), [t](const auto& e) { return e.id == t; }); timeouts.erase(i); } Linear search

Slide 71

Slide 71 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 71/205 Analysis of implementation perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses

Slide 72

Slide 72 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 72/205 Analysis of implementation perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses Presents statistics from whole run of program, using counters from HW and linux kernel.

Slide 73

Slide 73 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 73/205 Analysis of implementation perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses Presents statistics from whole run of program, using counters from HW and linux kernel. Number of cycles per instruction is a proxy for how much the CPU is working or waiting.

Slide 74

Slide 74 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 74/205 Analysis of implementation perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses Presents statistics from whole run of program, using counters from HW and linux kernel. Number of cycles per instruction is a proxy for how much the CPU is working or waiting. Number of reads from L1d cache, and number of misses. Speculative execution can make these numbers confusing.

Slide 75

Slide 75 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 75/205 Analysis of implementation perf stat -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses Presents statistics from whole run of program, using counters from HW and linux kernel. Number of cycles per instruction is a proxy for how much the CPU is working or waiting. Number of reads from L1d cache, and number of misses. Speculative execution can make these numbers confusing. Very fast!

Slide 76

Slide 76 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 76/205 Analysis of implementation perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses --call-graph=dwarf

Slide 77

Slide 77 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 77/205 Analysis of implementation perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses --call-graph=dwarf https://developer.amd.com/amd-uprof/ µProf https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html vtune

Slide 78

Slide 78 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 78/205 Analysis of implementation perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses --call-graph=dwarf Records where in your program the counters are gathered.

Slide 79

Slide 79 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 79/205 Analysis of implementation perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses --call-graph=dwarf Records where in your program the counters are gathered. Records call graph info, instead of just location. dwarf requires debug info.

Slide 80

Slide 80 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 80/205 Analysis of implementation perf record -e cycles,instructions,L1-dcache-loads,L1-dcache-load-misses --call-graph=dwarf Records where in your program the counters are gathered. Records call graph info, instead of just location. dwarf requires debug info. Very fast!

Slide 81

Slide 81 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 81/205 Live demo

Slide 82

Slide 82 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 82/205

Slide 83

Slide 83 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 83/205 Linear search is expensive. Maybe try binary search?

Slide 84

Slide 84 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 84/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector timeouts; uint32_t next_id = 0;

Slide 85

Slide 85 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 85/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; }

Slide 86

Slide 86 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 86/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Binary search for insertion point

Slide 87

Slide 87 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 87/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Linear insertion

Slide 88

Slide 88 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 88/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Linear insertion void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std::find_if(lo, hi, [t](const auto& e) { return e.id == t.id; }); if (i != hi) { timeouts.erase(i); } }

Slide 89

Slide 89 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 89/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Linear insertion void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std::find_if(lo, hi, [t](const auto& e) { return e.id == t.id; }); if (i != hi) { timeouts.erase(i); } } Binary search for timers with the same deadline

Slide 90

Slide 90 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 90/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Linear insertion void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std::find_if(lo, hi, [t](const auto& e) { return e.id == t.id; }); if (i != hi) { timeouts.erase(i); } } Linear search for matching id

Slide 91

Slide 91 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 91/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { uint32_t deadline; uint32_t id; void* userp; timer_cb callback; }; struct timer { uint32_t deadline; uint32_t id; }; std::vector timeouts; uint32_t next_id = 0; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { timer_data element{deadline, next_id, userp, cb}; auto i = std::lower_bound(timeouts.begin(), timeouts.end(), element, is_after); timeouts.insert(i, element); return {deadline, next_id++}; } Linear insertion void cancel_timer(timer t) { timer_data element{t.deadline, t.id, nullptr, nullptr}; auto [lo, hi] = std::equal_range(timeouts.begin(), timeouts.end(), element, is_after); auto i = std::find_if(lo, hi, [t](const auto& e) { return e.id == t.id; }); if (i != hi) { timeouts.erase(i); } } Linear removal

Slide 92

Slide 92 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 92/205 Live demo

Slide 93

Slide 93 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 93/205 Searches not visible in profiling. Number of reads reduced. Number of cache misses high. memmove() dominates.

Slide 94

Slide 94 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 94/205 Searches not visible in profiling. Number of reads reduced. Number of cache misses high. memmove() dominates. 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 1.00E-06 10.00E-06 linear_array bsearch_array elements seconds per element

Slide 95

Slide 95 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 95/205 Failed branch predictions can lead to cache entry eviction on some CPUs (spectre/meltdown) Searches not visible in profiling. Number of reads reduced. Number of cache misses high. memmove() dominates.

Slide 96

Slide 96 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 96/205 Failed branch predictions can lead to cache entry eviction on some CPUs (spectre/meltdown) Searches not visible in profiling. Number of reads reduced. Number of cache misses high. memmove() dominates. Maybe try a map<>?

Slide 97

Slide 97 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 97/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { void* userp; timer_cb callback; }; struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; } }; using timer_map = std::multimap; using timer = timer_map::iterator; static timer_map timeouts;

Slide 98

Slide 98 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 98/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { void* userp; timer_cb callback; }; struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; } }; using timer_map = std::multimap; using timer = timer_map::iterator; static timer_map timeouts; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std::make_pair(deadline, timer_data{userp, cb})); } void cancel_timer(timer t) { timeouts.erase(t); }

Slide 99

Slide 99 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 99/205 typedef uint32_t (*timer_cb)(void*); struct timer_data { void* userp; timer_cb callback; }; struct is_after { bool operator()(uint32_t lh, uint32_t rh) const { return lh < rh; } }; using timer_map = std::multimap; using timer = timer_map::iterator; static timer_map timeouts; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { return timeouts.insert(std::make_pair(deadline, timer_data{userp, cb})); } void cancel_timer(timer t) { timeouts.erase(t); } bool shoot_first() { if (timeouts.empty()) return false; auto i = timeouts.begin(); i->second.callback(i->second.userp); timeouts.erase(i); return true; }

Slide 100

Slide 100 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 100/205 Live demo

Slide 101

Slide 101 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 101/205

Slide 102

Slide 102 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 102/205 Faster, but lots of cache misses when comparing keys and rebalancing the tree.

Slide 103

Slide 103 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 103/205 Faster, but lots of cache misses when comparing keys and rebalancing the tree. What did I say about chasing pointers?

Slide 104

Slide 104 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 104/205 Faster, but lots of cache misses when comparing keys and rebalancing the tree. What did I say about chasing pointers? 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 1.00E-06 10.00E-06 bsearch_array map elements seconds per element

Slide 105

Slide 105 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 105/205 Faster, but lots of cache misses when comparing keys and rebalancing the tree. What did I say about chasing pointers? Can we get log(n) lookup and insertion without chasing pointers?

Slide 106

Slide 106 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 106/205 Enter the HEAP

Slide 107

Slide 107 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 107/205 3 5 8 6 10 10 14 9 15 13 12 11 Enter the HEAP

Slide 108

Slide 108 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 108/205 3 5 8 6 10 10 14 9 15 13 12 11 Enter the HEAP ● Perfectly balanced partially sorted tree

Slide 109

Slide 109 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 109/205 3 5 8 6 10 10 14 9 15 13 12 11 Enter the HEAP ● Perfectly balanced partially sorted tree ● Every node is sorted after or same as its parent

Slide 110

Slide 110 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 110/205 3 5 8 6 10 10 14 9 15 13 12 11 Enter the HEAP ● Perfectly balanced partially sorted tree ● Every node is sorted after or same as its parent ● No relation between siblings

Slide 111

Slide 111 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 111/205 3 5 8 6 10 10 14 9 15 13 12 11 Enter the HEAP ● Perfectly balanced partially sorted tree ● Every node is sorted after or same as its parent ● No relation between siblings ● At most one node with only one child, and that child is the last node

Slide 112

Slide 112 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 112/205 10 14 15 13 12 11 8 3 5 6 9

Slide 113

Slide 113 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 113/205 10 14 15 13 12 11 8 Insertion: 3 5 6 9

Slide 114

Slide 114 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 114/205 10 14 15 13 12 11 8 Insertion: ● Create space 3 5 6 9

Slide 115

Slide 115 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 115/205 10 14 15 13 12 11 8 Insertion: ● Create space ● Trickle down greater nodes 3 5 6 9

Slide 116

Slide 116 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 116/205 10 14 15 13 12 11 8 Insertion: ● Create space ● Trickle down greater nodes ● Insert into space 3 5 6 9

Slide 117

Slide 117 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 117/205 10 14 15 13 12 11 8 7 Insertion: ● Create space ● Trickle down greater nodes ● Insert into space 3 5 6 9

Slide 118

Slide 118 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 118/205 10 10 14 15 13 12 11 8 7 Insertion: ● Create space ● Trickle down greater nodes ● Insert into space 3 5 6 9

Slide 119

Slide 119 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 119/205 10 10 14 15 13 12 11 8 7 Insertion: ● Create space ● Trickle down greater nodes ● Insert into space 3 5 6 9

Slide 120

Slide 120 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 120/205 10 10 14 15 13 12 11 8 7 Insertion: ● Create space ● Trickle down greater nodes ● Insert into space 3 5 6 9

Slide 121

Slide 121 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 121/205 7 8 10 14 15 13 12 11 3 5 6 9 10

Slide 122

Slide 122 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 122/205 7 8 10 14 15 13 12 11 Pop top: 3 5 6 9 10

Slide 123

Slide 123 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 123/205 7 8 10 14 15 13 12 11 Pop top: ● Remove top 3 5 6 9 10

Slide 124

Slide 124 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 124/205 7 8 10 14 15 13 12 11 Pop top: ● Remove top ● Trickle up lesser child 3 5 6 9 10

Slide 125

Slide 125 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 125/205 7 8 10 14 15 13 12 11 Pop top: ● Remove top ● Trickle up lesser child ● move-insert last into hole 3 5 6 9 10

Slide 126

Slide 126 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 126/205 7 8 10 14 15 13 12 11 Pop top: ● Remove top ● Trickle up lesser child ● move-insert last into hole 5 6 9 10

Slide 127

Slide 127 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 127/205 7 8 10 14 15 13 12 11 Pop top: ● Remove top ● Trickle up lesser child ● move-insert last into hole 5 6 9 10

Slide 128

Slide 128 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 128/205 7 8 10 14 15 13 12 11 Pop top: ● Remove top ● Trickle up lesser child ● move-insert last into hole 5 6 9 10

Slide 129

Slide 129 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 129/205 7 8 10 14 15 13 12 11 Pop top: ● Remove top ● Trickle up lesser child ● move-insert last into hole 5 6 9 10

Slide 130

Slide 130 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 130/205 7 8 10 14 15 13 12 11 Pop top: ● Remove top ● Trickle up lesser child ● move-insert last into hole 5 6 9 10

Slide 131

Slide 131 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 131/205 7 8 10 14 15 13 12 11 Pop top: ● Remove top ● Trickle up lesser child ● move-insert last into hole 5 6 9 10

Slide 132

Slide 132 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 132/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 10 8 12 9 13 10 11 11 15 12 15 15 15 15

Slide 133

Slide 133 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 133/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 10 8 12 9 13 10 11 11 15 12 15 15 15 15 Addressing: The index of a parent node is half (rounded down) of that of a child.

Slide 134

Slide 134 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 134/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 10 8 12 9 13 10 11 11 15 12 15 15 15 15 Addressing: The index of a parent node is half (rounded down) of that of a child.

Slide 135

Slide 135 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 135/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 10 8 12 9 13 10 11 11 15 12 15 15 15 15 Addressing: The index of a parent node is half (rounded down) of that of a child. Array indexes! No pointer chasing!

Slide 136

Slide 136 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 136/205 The heap is not searchable, so how handle cancellation?

Slide 137

Slide 137 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 137/205 The heap is not searchable, so how handle cancellation? struct timer_action { uint32_t (*callback)(void*); void* userp; };

Slide 138

Slide 138 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 138/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; };

Slide 139

Slide 139 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 139/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; };

Slide 140

Slide 140 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 140/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; }; struct timeout { uint32_t deadline; uint32_t action_index; };

Slide 141

Slide 141 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 141/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; }; struct timeout { uint32_t deadline; uint32_t action_index; };

Slide 142

Slide 142 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 142/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; }; struct timeout { uint32_t deadline; uint32_t action_index; }; Only 8 bytes per element of working data in the heap.

Slide 143

Slide 143 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 143/205 The heap is not searchable, so how handle cancellation? actions struct timer_action { uint32_t (*callback)(void*); void* userp; }; struct timeout { uint32_t deadline; uint32_t action_index; }; Cancel by setting callback to nullptr Only 8 bytes per element of working data in the heap.

Slide 144

Slide 144 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 144/205 struct timer_data { uint32_t deadline; uint32_t action_index; }; struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; } }; std::priority_queue, is_after> timeouts; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index; }

Slide 145

Slide 145 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 145/205 struct timer_data { uint32_t deadline; uint32_t action_index; }; struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; } }; std::priority_queue, is_after> timeouts; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index; } Container adapter that implements a heap

Slide 146

Slide 146 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 146/205 struct timer_data { uint32_t deadline; uint32_t action_index; }; struct is_after { bool operator()(const timer_data& lh, const timer_data& rh) const { return lh.deadline < rh.deadline; } }; std::priority_queue, is_after> timeouts; timer schedule_timer(uint32_t deadline, timer_cb cb, void* userp) { auto action_index = actions.push(cb, userp); timeouts.push(timer_data{deadline, action_index}); return action_index; } Container adapter that implements a heap

Slide 147

Slide 147 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 147/205 bool shoot_first() { while (!timeouts.empty()) { auto& t = timeouts.top(); auto& action = actions[t.action_index]; if (action.callback) break; actions.remove(t.action_index); timeouts.pop(); } if (timeouts.empty()) return false; auto& t = timeouts.top(); auto& action = actions[t.action_index]; action.callback(action.userp); actions.remove(t.action_index); timeouts.pop(); return true; } Pop-off any cancelled items

Slide 148

Slide 148 text

Slide 149

Slide 149 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 149/205 A lot fewer everything! and about twice as fast too

Slide 150

Slide 150 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 150/205 A lot fewer everything! and about twice as fast too 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 map heap_aux elements seconds per element

Slide 151

Slide 151 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 151/205 A lot fewer everything! and about twice as fast too But there are many cache misses in the adjust-heap functions

Slide 152

Slide 152 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 152/205 A lot fewer everything! and about twice as fast too But there are many cache misses in the adjust-heap functions Can we do better?

Slide 153

Slide 153 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 153/205 How do the entries fit in cache lines?

Slide 154

Slide 154 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 154/205 How do the entries fit in cache lines?

Slide 155

Slide 155 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 155/205 How do the entries fit in cache lines?

Slide 156

Slide 156 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 156/205 How do the entries fit in cache lines?

Slide 157

Slide 157 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 157/205 How do the entries fit in cache lines?

Slide 158

Slide 158 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 158/205 How do the entries fit in cache lines?

Slide 159

Slide 159 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 159/205 How do the entries fit in cache lines?

Slide 160

Slide 160 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 160/205 Every generation is on a new cache line

Slide 161

Slide 161 text

Slide 162

Slide 162 text

Slide 163

Slide 163 text

Slide 164

Slide 164 text

Slide 165

Slide 165 text

Slide 166

Slide 166 text

Slide 167

Slide 167 text

Slide 168

Slide 168 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 168/205 Three generations per cache line!

Slide 169

Slide 169 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 169/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7

Slide 170

Slide 170 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 170/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7

Slide 171

Slide 171 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 171/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 0

Slide 172

Slide 172 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 172/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 0 8 9 10 11 12 13 14 15

Slide 173

Slide 173 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 173/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 0 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Slide 174

Slide 174 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 174/205 5 1 6 2 7 3 9 4 10 5 8 6 14 7 0 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Slide 175

Slide 175 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 175/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) == 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size >> 1)) != 0U; } ... }; 1 2 3 4 5 6 7 0

Slide 176

Slide 176 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 176/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) == 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size >> 1)) != 0U; } ... }; 1 2 3 4 5 6 7 0

Slide 177

Slide 177 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 177/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) == 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size >> 1)) != 0U; } ... }; 1 2 3 4 5 6 7 0

Slide 178

Slide 178 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 178/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) == 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size >> 1)) != 0U; } ... }; 1 2 3 4 5 6 7 0

Slide 179

Slide 179 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 179/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx) { return idx & block_mask; } static size_t block_base(size_t idx) { return idx & ~block_mask; } static bool is_block_root(size_t idx) { return block_offset(idx) == 1; } static bool is_block_leaf(size_t idx) { return (idx & (block_size >> 1)) != 0U; } ... }; 1 2 3 4 5 6 7 0

Slide 180

Slide 180 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 180/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx); static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; } ... };

Slide 181

Slide 181 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 181/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx); static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; } ... };

Slide 182

Slide 182 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 182/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx); static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; } ... }; static size_t parent_of(size_t idx) { auto const node_root = block_base(idx); if (!is_block_root(idx)) return node_root + block_offset(idx) / 2; auto parent_base = block_base(node_root / block_size - 1); auto child = ((idx - block_size) / block_size - parent_base) / 2; return parent_base + block_size / 2 + child; }

Slide 183

Slide 183 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 183/205 class timeout_store { static constexpr size_t block_size = 8; static constexpr size_t block_mask = block_size – 1U; static size_t block_offset(size_t idx); static size_t block_base(size_t idx); static bool is_block_root(size_t idx); static bool is_block_leaf(size_t idx); static size_t left_child_of(size_t idx) { if (!is_block_leaf(idx)) return idx + block_offset(idx); auto base = block_base(idx) + 1; return base * block_size + child_no(idx) * block_size * 2 + 1; } ... }; static size_t parent_of(size_t idx) { auto const node_root = block_base(idx); if (!is_block_root(idx)) return node_root + block_offset(idx) / 2; auto parent_base = block_base(node_root / block_size - 1); auto child = ((idx - block_size) / block_size - parent_base) / 2; return parent_base + block_size / 2 + child; }

Slide 184

Slide 184 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 184/205 class timeout_store { ... using allocator = align_allocator<64>::type; std::vector bheap_store; };

Slide 185

Slide 185 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 185/205 class timeout_store { ... using allocator = align_allocator<64>::type; std::vector bheap_store; };

Slide 186

Slide 186 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 186/205 class timeout_store { ... using allocator = align_allocator<64>::type; std::vector bheap_store; }; template struct align_allocator { template struct type { using value_type = T; static constexpr std::align_val_t alignment{N}; T* allocate(size_t n) { return static_cast(operator new(n*sizeof(T), alignment)); } void deallocate(T* p, size_t) { operator delete(p, alignment); } }; };

Slide 187

Slide 187 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 187/205 class timeout_store { ... using allocator = align_allocator<64>::type; std::vector bheap_store; }; template struct align_allocator { template struct type { using value_type = T; static constexpr std::align_val_t alignment{N}; T* allocate(size_t n) { return static_cast(operator new(n*sizeof(T), alignment)); } void deallocate(T* p, size_t) { operator delete(p, alignment); } }; };

Slide 188

Slide 188 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 188/205 class timeout_store { ... using allocator = align_allocator<64>::type; std::vector bheap_store; }; template struct align_allocator { template struct type { using value_type = T; static constexpr std::align_val_t alignment{N}; T* allocate(size_t n) { return static_cast(operator new(n*sizeof(T), alignment)); } void deallocate(T* p, size_t) { operator delete(p, alignment); } }; }; Aligned operator new and delete came with C++ 17

Slide 189

Slide 189 text

Slide 190

Slide 190 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 190/205 Many more instructions and branches

Slide 191

Slide 191 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 191/205 Many more instructions and branches But fewer cache accesses and cache misses and branch mispredictions

Slide 192

Slide 192 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 192/205 Many more instructions and branches But fewer cache accesses and cache misses and branch mispredictions and (maybe) faster?

Slide 193

Slide 193 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 193/205 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 heap_aux bheap_aux elements seconds per element Many more instructions and branches But fewer cache accesses and cache misses and branch mispredictions and (maybe) faster?

Slide 194

Slide 194 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 194/205 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 heap_aux bheap_aux elements seconds per element Many more instructions and branches But fewer cache accesses and cache misses and branch mispredictions and (maybe) faster? 1 10 100 1000 10000 100000 10.00E-09 100.00E-09 1.00E-06 10.00E-06 linear_array bheap_aux elements seconds per element

Slide 195

Slide 195 text

Slide 196

Slide 196 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 196/205 Rules of thumb ● Following a pointer is a cache miss, unless you have information to the contrary

Slide 197

Slide 197 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 197/205 Rules of thumb ● Following a pointer is a cache miss, unless you have information to the contrary ● Smaller working data set is better

Slide 198

Slide 198 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 198/205 Rules of thumb ● Following a pointer is a cache miss, unless you have information to the contrary ● Smaller working data set is better ● Use as much of a cache entry as you can

Slide 199

Slide 199 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 199/205 Rules of thumb ● Following a pointer is a cache miss, unless you have information to the contrary ● Smaller working data set is better ● Use as much of a cache entry as you can ● Sequential memory accesses can be very fast due to prefetching

Slide 200

Slide 200 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 200/205 Rules of thumb ● Following a pointer is a cache miss, unless you have information to the contrary ● Smaller working data set is better ● Use as much of a cache entry as you can ● Sequential memory accesses can be very fast due to prefetching ● Fewer evicted cache lines means more data in hot cache for the rest of the program

Slide 201

Slide 201 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 201/205 Rules of thumb ● Following a pointer is a cache miss, unless you have information to the contrary ● Smaller working data set is better ● Use as much of a cache entry as you can ● Sequential memory accesses can be very fast due to prefetching ● Fewer evicted cache lines means more data in hot cache for the rest of the program ● Mispredicted branches can evict cache entries (spectre/meltdown)

Slide 202

Slide 202 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 202/205 Rules of thumb ● Following a pointer is a cache miss, unless you have information to the contrary ● Smaller working data set is better ● Use as much of a cache entry as you can ● Sequential memory accesses can be very fast due to prefetching ● Fewer evicted cache lines means more data in hot cache for the rest of the program ● Mispredicted branches can evict cache entries (spectre/meltdown) ● Linear access in contiguous memory rules for small data sets!

Slide 203

Slide 203 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 203/205 Rules of thumb ● Following a pointer is a cache miss, unless you have information to the contrary ● Smaller working data set is better ● Use as much of a cache entry as you can ● Sequential memory accesses can be very fast due to prefetching ● Fewer evicted cache lines means more data in hot cache for the rest of the program ● Mispredicted branches can evict cache entries (spectre/meltdown) ● Linear access in contiguous memory rules for small data sets! ● Measure measure measure

Slide 204

Slide 204 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 204/205 Resources Ulrich Drepper - “What every programmer should know about memory” http://www.akkadia.org/drepper/cpumemory.pdf Milian Wolff - “Linux perf for Qt Developers” https://www.youtube.com/watch?v=L4NClVxqdMw Travis Downs - “Cache counters rant” https://tinyurl.com/cache-counters-rant Emery Berger - “Performance Matters” https://www.youtube.com/watch?v=r-TLSBdHe1A

Slide 205

Slide 205 text

What Do You Mean by “Cache Friendly”? – C++OnSea 2022 © Björn Fahller @bjorn_fahller 205/205 [email protected] @bjorn_fahller @rollbear Björn Fahller What Do You Mean by “Cache Friendly”?