Slide 1

Slide 1 text

Building a Highly Concurrent Cache in Go A Hitchhiker’s Guide Konrad Reiche

Slide 2

Slide 2 text

“ “ In its full generality, multithreading is an incredibly complex and error-prone technique, not to be recommended in any but the smallest programs. ― C. A. R. Hoare: Communicating Sequential Processes

Slide 3

Slide 3 text

3 Building a Highly Concurrent Cache in Go Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 4

Slide 4 text

4 Building a Highly Concurrent Cache in Go Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 5

Slide 5 text

5 Building a Highly Concurrent Cache in Go Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 6

Slide 6 text

6 Building a Highly Concurrent Cache in Go Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 7

Slide 7 text

7 Building a Highly Concurrent Cache in Go Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 8

Slide 8 text

8 Building a Highly Concurrent Cache in Go Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 9

Slide 9 text

9 Building a Highly Concurrent Cache in Go Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 10

Slide 10 text

10 Building a Highly Concurrent Cache in Go Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 11

Slide 11 text

11 Building a Highly Concurrent Cache in Go Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 12

Slide 12 text

Building a Highly Concurrent Cache in Go 12 Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 13

Slide 13 text

Building a Highly Concurrent Cache in Go 13 Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 14

Slide 14 text

Building a Highly Concurrent Cache in Go 14 Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 15

Slide 15 text

Building a Highly Concurrent Cache in Go 15 Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 16

Slide 16 text

Building a Highly Concurrent Cache in Go 16 Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 17

Slide 17 text

Building a Highly Concurrent Cache in Go 17 Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 18

Slide 18 text

Building a Highly Concurrent Cache in Go 18 Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

Slide 19

Slide 19 text

[7932676.760082] Out of memory: kill process 23936 [7932676.761528] Killed process 23936 (go)

Slide 20

Slide 20 text

“ “ Don’t panic. 20 Building a Highly Concurrent Cache in Go

Slide 21

Slide 21 text

“ “ Don’t panic. 21 Building a Highly Concurrent Cache in Go ― Effective Go

Slide 22

Slide 22 text

“ “ Don’t panic. 22 Building a Highly Concurrent Cache in Go ― Effective Go ― Douglas Adams: The Hitchhiker's Guide to the Galaxy

Slide 23

Slide 23 text

Cache

Slide 24

Slide 24 text

24 What is a Cache? Software component that stores data so that future requests for the data can be served faster. Building a Highly Concurrent Cache in Go

Slide 25

Slide 25 text

25 What is a Cache? Software component that stores data so that future requests for the data can be served faster. Remember the result of an expensive operation, to speed up reads. Building a Highly Concurrent Cache in Go

Slide 26

Slide 26 text

26 What is a Cache? Software component that stores data so that future requests for the data can be served faster. Remember the result of an expensive operation, to speed up reads. Building a Highly Concurrent Cache in Go data, ok := cache.Get() if !ok { data := doSomething() cache.Set(data) }

Slide 27

Slide 27 text

27 Me ca. 2019

Slide 28

Slide 28 text

28 My Manager ca. 2019

Slide 29

Slide 29 text

29 My Manager and Me Where is the cache Lebowski Reiche?

Slide 30

Slide 30 text

30 Caching Post Data for Ranking Building a Highly Concurrent Cache in Go Kubernetes Pods for Ranking Service

Slide 31

Slide 31 text

31 Building a Highly Concurrent Cache in Go Kubernetes Pods for Ranking Service Thing Service Get data for filtering posts Caching Post Data for Ranking

Slide 32

Slide 32 text

32 Building a Highly Concurrent Cache in Go Kubernetes Pods for Ranking Service Thing Service Get data for filtering posts Redis Cache Cluster Caching Post Data for Ranking

Slide 33

Slide 33 text

33 Building a Highly Concurrent Cache in Go Kubernetes Pods for Ranking Service Thing Service Get data for filtering posts Redis Cache Cluster Cache Lookups Caching Post Data for Ranking Update on Miss

Slide 34

Slide 34 text

34 Building a Highly Concurrent Cache in Go Kubernetes Pods for Ranking Service Thing Service Get data for filtering posts Redis Cache Cluster Caching Post Data for Ranking Cache Lookups 5-20m op/sec Update on Miss

Slide 35

Slide 35 text

35 Building a Highly Concurrent Cache in Go Kubernetes Pods for Ranking Service Thing Service Get data for filtering posts Redis Cache Cluster Caching Post Data for Ranking Local In-Memory Cache with ~90k op/sec per pod Cache Lookups 5-20m op/sec Update on Miss

Slide 36

Slide 36 text

36 type cache struct { data map[string]string } Building a Highly Concurrent Cache in Go

Slide 37

Slide 37 text

37 type cache[K comparable, V any] struct { data map[K]V } Building a Highly Concurrent Cache in Go

Slide 38

Slide 38 text

38 type cache struct { data map[string]string } Building a Highly Concurrent Cache in Go We can introduce generics once needed to keep it simple.

Slide 39

Slide 39 text

39 type cache struct { data map[string]string } func (c *cache) Set(key, value string) { c.data[key] = value } func (c *cache) Get(key string) (string, bool) { value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go

Slide 40

Slide 40 text

40 type cache struct { mu sync.Mutex data map[string]string } func (c *cache) Set(key, value string) { c.data[key] = value } func (c *cache) Get(key string) (string, bool) { value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go

Slide 41

Slide 41 text

41 type cache struct { mu sync.Mutex data map[string]string } func (c *cache) Set(key, value string) { c.data[key] = value } func (c *cache) Get(key string) (string, bool) { value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go

Slide 42

Slide 42 text

42 type cache struct { mu sync.Mutex data map[string]string } func (c *cache) Set(key, value string) { c.mu.Lock() defer c.mu.Unlock() c.data[key] = value } func (c *cache) Get(key string) (string, bool) { c.mu.Lock() defer c.mu.Unlock() value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go

Slide 43

Slide 43 text

43 type cache struct { mu sync.RWMutex data map[string]string } func (c *cache) Set(key, value string) { c.mu.Lock() defer c.mu.Unlock() c.data[key] = value } func (c *cache) Get(key string) (string, bool) { c.mu.RLock() defer c.mu.RUnlock() value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go

Slide 44

Slide 44 text

44 type cache struct { mu sync.RWMutex data map[string]string } func (c *cache) Set(key, value string) { c.mu.Lock() defer c.mu.Unlock() c.data[key] = value } func (c *cache) Get(key string) (string, bool) { c.mu.RLock() defer c.mu.RUnlock() value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go Don’t generalize the cache from the start; pick an API that maximizes your usage pattern.

Slide 45

Slide 45 text

45 type cache struct { mu sync.RWMutex data map[string]string } func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { c.data[key] = value } } func (c *cache) Get(keys []string) map[string]string { result := make(map[string]string) c.mu.RLock() defer c.mu.RUnlock() for _, key := range keys { if value, ok := c.data[key]; ok { result[key] = value } } return result } Building a Highly Concurrent Cache in Go

Slide 46

Slide 46 text

[7932676.760082] Out of memory: kill process 23936 [7932676.761528] Killed process 23936 (go)

Slide 47

Slide 47 text

Cache Replacement Policies

Slide 48

Slide 48 text

48 Once the cache exceeds its maximum capacity, which data should be evicted to make space for new data? Building a Highly Concurrent Cache in Go Cache Replacement Policies Cache Entry 1 Entry 2 Entry 3 ⋮ Entry n

Slide 49

Slide 49 text

49 Once the cache exceeds its maximum capacity, which data should be evicted to make space for new data? Bélády's Optimal Replacement Algorithm Remove the entry whose next use will occur farthest in the future. Building a Highly Concurrent Cache in Go Cache Replacement Policies Cache Entry 1 Entry 2 Entry 3 ⋮ Entry n

Slide 50

Slide 50 text

50 Once the cache exceeds its maximum capacity, which data should be evicted to make space for new data? Bélády's Optimal Replacement Algorithm Remove the entry whose next use will occur farthest in the future. Building a Highly Concurrent Cache in Go Cache Replacement Policies Because we cannot predict the future, we can only try to approximate this behavior. Cache Entry 1 Entry 2 Entry 3 ⋮ Entry n

Slide 51

Slide 51 text

51 Taxonomy of Cache Replacement Policies Building a Highly Concurrent Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification Longer History More Access Patterns [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.

Slide 52

Slide 52 text

52 Taxonomy of Cache Replacement Policies Building a Highly Concurrent Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification Longer History More Access Patterns [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019. LRU LFU

Slide 53

Slide 53 text

53 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently.

Slide 54

Slide 54 text

54 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. type cache struct { size int mu sync.RWMutex data map[string]string }

Slide 55

Slide 55 text

55 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. type cache struct { size int mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string frequency int index int }

Slide 56

Slide 56 text

56 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } }

Slide 57

Slide 57 text

57 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } } a frequency: 1

Slide 58

Slide 58 text

58 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } } a frequency: 1 b frequency: 1

Slide 59

Slide 59 text

59 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. a frequency: 1 b frequency: 1 c frequency: 1 func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } }

Slide 60

Slide 60 text

60 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Get(keys []string) ( map[string]string, []string, ) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok { result[key] = item.value frequency := item.frequency+1 c.heap.update(item, frequency) } else { missing = append(missing, key) } } return result, missing } a frequency: 1 b frequency: 1 c frequency: 1

Slide 61

Slide 61 text

61 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Get(keys []string) ( map[string]string, []string, ) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok { result[key] = item.value frequency := item.frequency+1 c.heap.update(item, frequency) } else { missing = append(missing, key) } } return result, missing } a frequency: 1 b frequency: 1 c frequency: 1 cache.Get([]string{"a", "b"})

Slide 62

Slide 62 text

62 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Get(keys []string) ( map[string]string, []string, ) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok { result[key] = item.value frequency := item.frequency+1 c.heap.update(item, frequency) } else { missing = append(missing, key) } } return result, missing } c frequency: 1 a frequency: 2 b frequency: 2

Slide 63

Slide 63 text

63 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } } c frequency: 1 a frequency: 2 b frequency: 2 cache.Set(map[string]string{ "d": "⌯Go", })

Slide 64

Slide 64 text

64 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. c frequency: 1 d frequency: 1 b frequency: 2 a frequency: 2 func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } }

Slide 65

Slide 65 text

65 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. c frequency: 1 d frequency: 1 b frequency: 2 a frequency: 2 func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } }

Slide 66

Slide 66 text

66 Building a Highly Concurrent Cache in Go LFU Least Frequently Used (LFU) Favor entries that are used frequently. d frequency: 1 a frequency: 2 b frequency: 2 func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } }

Slide 67

Slide 67 text

67 Taxonomy of Cache Replacement Policies Building a Highly Concurrent Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification Longer History More Access Patterns [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019. LRU LFU

Slide 68

Slide 68 text

68 Taxonomy of Cache Replacement Policies Building a Highly Concurrent Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification LRU EELRU SegLRU LIP SRRIP LFU FBR 2Q ARC LRFU DIP DRRIP EVA Timekeeping AIP ETA Leeway DBCP EAF SDBP SHiP Hawkeye [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.

Slide 69

Slide 69 text

69 Building a Highly Concurrent Cache in Go Kubernetes Pods for Ranking Service Thing Service Get data for filtering posts Redis Cache Cluster Caching Post Data for Ranking Local In-Memory Cache with ~90k op/sec per pod Cache Lookups 5-20m op/sec Update on Miss

Slide 70

Slide 70 text

Benchmarks

Slide 71

Slide 71 text

Benchmarks Functions of the form func BenchmarkXxx(*testing.B) are considered benchmarks, and are executed by the go test command when its -bench flag is provided. B is a type passed to Benchmark functions to manage benchmark timing and to specify the number of iterations to run.

Slide 72

Slide 72 text

Benchmarks ● Before improving the performance of code, we should measure its current performance ● Create a stable environment ○ Idle machine ○ No shared hardware ○ Don’t browse the web ○ Power saving, thermal scaling ● The testing package has built-in support for writing benchmarks

Slide 73

Slide 73 text

Benchmarks func BenchmarkGet(b *testing.B) { for i := 0; i < b.N; i++ { } }

Slide 74

Slide 74 text

Benchmarks func BenchmarkGet(b *testing.B) { cache := NewLFUCache(100) keys := make([]string, 100) items := make(map[string]string) for i := 0; i < 100; i++ { kv := fmt.Sprint(i) keys[i] = kv items[kv] = kv } for i := 0; i < b.N; i++ { } }

Slide 75

Slide 75 text

Benchmarks func BenchmarkGet(b *testing.B) { cache := NewLFUCache(100) keys := make([]string, 100) items := make(map[string]string) for i := 0; i < 100; i++ { kv := fmt.Sprint(i) keys[i] = kv items[kv] = kv } cache.Set(items) b.ResetTimer() for i := 0; i < b.N; i++ { } }

Slide 76

Slide 76 text

Benchmarks func BenchmarkGet(b *testing.B) { cache := NewLFUCache(100) keys := make([]string, 100) items := make(map[string]string) for i := 0; i < 100; i++ { kv := fmt.Sprint(i) keys[i] = kv items[kv] = kv } cache.Set(items) b.ResetTimer() for i := 0; i < b.N; i++ { cache.Get(keys) } }

Slide 77

Slide 77 text

go test -run=^$ -bench=BenchmarkGet Benchmarks

Slide 78

Slide 78 text

go test -run=^$ -bench=BenchmarkGet -count=5 Benchmarks

Slide 79

Slide 79 text

go test -run=^$ -bench=BenchmarkGet -count=5 goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz BenchmarkGet-16 117642 9994 ns/op BenchmarkGet-16 116402 10018 ns/op BenchmarkGet-16 121834 9817 ns/op BenchmarkGet-16 123241 9942 ns/op BenchmarkGet-16 109621 10022 ns/op PASS ok github.com/konradreiche/cache 6.520s Benchmarks

Slide 80

Slide 80 text

Benchmarks func BenchmarkGet(b *testing.B) { cache := NewLFUCache(100) keys := make([]string, 100) items := make(map[string]string) for i := 0; i < 100; i++ { kv := fmt.Sprint(i) keys[i] = kv items[kv] = kv } cache.Set(items) b.ResetTimer() for i := 0; i < b.N; i++ { cache.Get(keys) } }

Slide 81

Slide 81 text

Limitations ● We want to analyze and optimize all cache operations: Get, Set, Eviction ● Not all code paths are covered ● What cache hit and miss ratio to benchmark for? ● No concurrency, benchmark executes with one goroutine ● How do different operations behave when interleaving concurrently?

Slide 82

Slide 82 text

Limitations ● We want to analyze and optimize all cache operations: Get, Set, Eviction ● Not all code paths are covered ● What cache hit and miss ratio to benchmark for? ● No concurrency, benchmark executes with one goroutine ● How do different operations behave when interleaving concurrently?

Slide 83

Slide 83 text

Real Sample Data Event log of cache access over 30 minutes including: ● timestamp ● posts keys 107907533,SA,Lw,OA,Iw,aA,RA,KA,CQ,Ow,Aw,Hg,Kg 111956832,upgb 121807061,upgb 134028958,l3Ir,iPMq,PcUn,T5Ej,ZQs,kTM,/98F,BFwJ,Oik,uYIB,gv8F 137975373,crgb,SCMU,NXUd,EyQI,244Z,DB4H,Tp0H,Kh8b,gREH,g9kG,o34E,wSYI,u+wF,h40M 142509895,iwM,hgM,CQQ,YQI 154850130,jTE,ciU,2U4,GQkB,4xo,U2QC,/7oB,dRIC,M0gB,bwYk ...

Slide 84

Slide 84 text

Limitations ● We want to analyze and optimize all cache operations: Get, Set, Eviction ● Not all code paths are covered ● What cache hit and miss ratio to benchmark for? ● No concurrency, benchmark executes with one goroutine ● How do different operations behave when interleaving concurrently?

Slide 85

Slide 85 text

b.RunParallel RunParallel runs a benchmark in parallel. It creates multiple goroutines and distributes b.N iterations among them. The number of goroutines defaults to GOMAXPROCS.

Slide 86

Slide 86 text

b.RunParallel RunParallel runs a benchmark in parallel. It creates multiple goroutines and distributes b.N iterations among them. The number of goroutines defaults to GOMAXPROCS. func BenchmarkCache(b *testing.B) { b.RunParallel(func(pb *testing.PB) { // set up goroutine local state for pb.Next() { // execute one iteration of the benchmark } }) }

Slide 87

Slide 87 text

func BenchmarkCache(b *testing.B) { cb := newBenchmarkCase(b, config{size: 400_000}) b.ResetTimer() b.RunParallel(func(pb *testing.PB) { var br benchmarkResult for pb.Next() { log := cb.nextEventLog() start := time.Now() cached, missing := cb.cache.Get(log.keys) br.observeGet(start, cached, missing) if len(missing) > 0 { data := lookupData(missing) start := time.Now() cb.cache.Set(data) br.observeSetDuration(start) } } cb.addLocalReports(br) }) b.ReportMetric(cb.getHitRate(), "hit/op") b.ReportMetric(cb.getTimePerGet(b), "read-ns/op") b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") }

Slide 88

Slide 88 text

func BenchmarkCache(b *testing.B) { cb := newBenchmarkCase(b, config{size: 400_000}) b.ResetTimer() b.RunParallel(func(pb *testing.PB) { var br benchmarkResult for pb.Next() { log := cb.nextEventLog() start := time.Now() cached, missing := cb.cache.Get(log.keys) br.observeGet(start, cached, missing) if len(missing) > 0 { data := lookupData(missing) start := time.Now() cb.cache.Set(data) br.observeSetDuration(start) } } cb.addLocalReports(br) }) b.ReportMetric(cb.getHitRate(), "hit/op") b.ReportMetric(cb.getTimePerGet(b), "read-ns/op") b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") } Custom benchmark case type to manage benchmark and collect data

Slide 89

Slide 89 text

func BenchmarkCache(b *testing.B) { cb := newBenchmarkCase(b, config{size: 400_000}) b.ResetTimer() b.RunParallel(func(pb *testing.PB) { var br benchmarkResult for pb.Next() { log := cb.nextEventLog() start := time.Now() cached, missing := cb.cache.Get(log.keys) br.observeGet(start, cached, missing) if len(missing) > 0 { data := lookupData(missing) start := time.Now() cb.cache.Set(data) br.observeSetDuration(start) } } cb.addLocalReports(br) }) b.ReportMetric(cb.getHitRate(), "hit/op") b.ReportMetric(cb.getTimePerGet(b), "read-ns/op") b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") } Collect per-goroutine benchmark measurements

Slide 90

Slide 90 text

func BenchmarkCache(b *testing.B) { cb := newBenchmarkCase(b, config{size: 400_000}) b.ResetTimer() b.RunParallel(func(pb *testing.PB) { var br benchmarkResult for pb.Next() { log := cb.nextEventLog() start := time.Now() cached, missing := cb.cache.Get(log.keys) br.observeGet(start, cached, missing) if len(missing) > 0 { data := lookupData(missing) start := time.Now() cb.cache.Set(data) br.observeSetDuration(start) } } cb.addLocalReports(br) }) b.ReportMetric(cb.getHitRate(), "hit/op") b.ReportMetric(cb.getTimePerGet(b), "read-ns/op") b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") } Reproduce production behavior: lookup & update.

Slide 91

Slide 91 text

func BenchmarkCache(b *testing.B) { cb := newBenchmarkCase(b, config{size: 400_000}) b.ResetTimer() b.RunParallel(func(pb *testing.PB) { var br benchmarkResult for pb.Next() { log := cb.nextEventLog() start := time.Now() cached, missing := cb.cache.Get(log.keys) br.observeGet(start, cached, missing) if len(missing) > 0 { data := lookupData(missing) start := time.Now() cb.cache.Set(data) br.observeSetDuration(start) } } cb.addLocalReports(br) }) b.ReportMetric(cb.getHitRate(), "hit/op") b.ReportMetric(cb.getTimePerGet(b), "read-ns/op") b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") } We can measure duration of individual operations manually Use b.ReportMetric to report custom metrics

Slide 92

Slide 92 text

go test -run=^$ -bench=BenchmarkCache -count=10

Slide 93

Slide 93 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x Ensure that each benchmark processes exactly 5,000 event logs to improve comparability of hit rate metric

Slide 94

Slide 94 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz BenchmarkCache/policy=lfu-16 5000 0.6141 hit/op 4795215 read-ns/op 2964262 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6082 hit/op 4686778 read-ns/op 2270200 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6159 hit/op 4332358 read-ns/op 1765885 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6153 hit/op 4089562 read-ns/op 2504176 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6152 hit/op 3472677 read-ns/op 1686928 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6107 hit/op 4464410 read-ns/op 2695443 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6155 hit/op 3624802 read-ns/op 1837148 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6133 hit/op 3931610 read-ns/op 2154571 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6151 hit/op 2440746 read-ns/op 1260662 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6138 hit/op 3491091 read-ns/op 1944350 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6703 hit/op 2320270 read-ns/op 1127495 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6712 hit/op 2212118 read-ns/op 1019305 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6705 hit/op 2150089 read-ns/op 1037654 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6703 hit/op 2512224 read-ns/op 1134282 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6710 hit/op 2377883 read-ns/op 1079198 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6711 hit/op 2313210 read-ns/op 1120761 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6712 hit/op 2071632 read-ns/op 980912 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2410096 read-ns/op 1127907 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2226160 read-ns/op 1071007 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2383321 read-ns/op 1165734 write-ns/op PASS ok github.com/konradreiche/cache 846.442s

Slide 95

Slide 95 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

Slide 96

Slide 96 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /policy bench

Slide 97

Slide 97 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LFU │ LRU │ │ hit/op │ hit/op vs base │ Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10) │ LFU │ LRU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10) │ LFU │ LRU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)

Slide 98

Slide 98 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LFU │ LRU │ │ hit/op │ hit/op vs base │ Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10) │ LFU │ LRU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10) │ LFU │ LRU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)

Slide 99

Slide 99 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LFU │ LRU │ │ hit/op │ hit/op vs base │ Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10) │ LFU │ LRU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10) │ LFU │ LRU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)

Slide 100

Slide 100 text

100 Taxonomy of Cache Replacement Policies Building a Highly Concurrent Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019. LRU EELRU SegLRU LIP SRRIP LFU FBR 2Q ARC LRFU DIP DRRIP Timekeeping AIP ETA Leeway DBCP EAF SDBP SHiP Hawkeye EVA

Slide 101

Slide 101 text

101 Taxonomy of Cache Replacement Policies Building a Highly Concurrent Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019. LRU EELRU SegLRU LIP SRRIP LFU FBR 2Q ARC LRFU DIP DRRIP Timekeeping AIP ETA Leeway DBCP EAF SDBP SHiP Hawkeye EVA

Slide 102

Slide 102 text

102 Taxonomy of Cache Replacement Policies Building a Highly Concurrent Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019. LRU EELRU SegLRU LIP SRRIP LFU FBR 2Q ARC LRFU DIP DRRIP Timekeeping AIP ETA Leeway DBCP EAF SDBP SHiP Hawkeye EVA

Slide 103

Slide 103 text

Combining LFU & LRU

Slide 104

Slide 104 text

104 Building a Highly Concurrent Cache in Go LRFU (Least Recently/Least Frequently) A paper [2] published in 2001 suggests to combine LRU and LFU named LRFU. ● Similar to LFU: each item holds a value ● CRF: Combined Recency and Frequency ● A parameter λ determines how much weight is given to recent entries type cache struct { size int mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string index int frequency int } [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

Slide 105

Slide 105 text

A paper [2] published in 2001 suggests to combine LRU and LFU named LRFU. ● Similar to LFU: each item holds a value ● CRF: Combined Recency and Frequency ● A parameter λ determines how much weight is given to recent entries λ = 1.0 (LRU) λ = 0.0 (LFU) 105 Building a Highly Concurrent Cache in Go LRFU (Least Recently/Least Frequently) type cache struct { size int weight float64 mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string index int crf float64 } [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

Slide 106

Slide 106 text

A paper [2] published in 2001 suggests to combine LRU and LFU named LRFU. ● Similar to LFU: each item holds a value ● CRF: Combined Recency and Frequency ● A parameter λ determines how much weight is given to recent entries λ = 1.0 (LRU) λ = 0.0 (LFU) λ = 0.001 106 Building a Highly Concurrent Cache in Go LRFU (Least Recently/Least Frequently) type cache struct { size int weight float64 mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string index int crf float64 } [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

Slide 107

Slide 107 text

A paper [2] published in 2001 suggests to combine LRU and LFU named LRFU. ● Similar to LFU: each item holds a value ● CRF: Combined Recency and Frequency ● A parameter λ determines how much weight is given to recent entries λ = 1.0 (LRU) λ = 0.0 (LFU) λ = 0.001 (LFU with a pinch of LRU) 107 Building a Highly Concurrent Cache in Go LRFU (Least Recently/Least Frequently) type cache struct { size int weight float64 mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string index int crf float64 } [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

Slide 108

Slide 108 text

A paper [2] published in 2001 suggests to combine LRU and LFU named LRFU. ● Calculate CRF for every entry whenever they need to be compared ● math.Pow not a cheap operation ● 0.5λx prone to floating-point overflow ● New items likely to be evicted starting with CRF = 1.0 108 Building a Highly Concurrent Cache in Go LRFU (Least Recently/Least Frequently) type cache struct { size int weight float64 mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string index int crf float64 } [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

Slide 109

Slide 109 text

DLFU

Slide 110

Slide 110 text

110 Building a Highly Concurrent Cache in Go DLFU (Decaying LFU Cache Expiry) Donovan Baarda, a Google SRE from Australia, came up with an improved algorithm [3] and a Python reference implementation [4], realizing: 1. LRFU decay is a simple exponential decay 2. Exponential decay can be approximated which eliminates math.Pow 3. Exponentially grow the reference increment instead of exponentially decaying all entries, thus requiring fewer fields per entry and fewer comparisons [3] https://github.com/dbaarda/DLFUCache [4] https://minkirri.apana.org.au/wiki/DecayingLFUCacheExpiry

Slide 111

Slide 111 text

111 Building a Highly Concurrent Cache in Go func NewDLFUCache[V any](ctx context.Context, config config.Config) *DLFUCache[V] { cache := &DLFUCache[V]{ data: make(map[string]*Item[V], len(config.Size)), heap: &MinHeap[V]{}, weight: config.Weight, size: config.Size, incr: 1.0, } if config.Weight == 0.0 { // there is no decay for LFU policy cache.decay = 1 } p := float64(config.Size) * config.Weight cache.decay = (p + 1.0) / p return cache } DLFU Cache

Slide 112

Slide 112 text

112 Building a Highly Concurrent Cache in Go func (c *lfuCache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } } DLFU Cache

Slide 113

Slide 113 text

113 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } c.trim() } DLFU Cache

Slide 114

Slide 114 text

114 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } c.trim() } DLFU Cache

Slide 115

Slide 115 text

115 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, score: c.incr, } c.data[key] = item heap.Push(&c.heap, item) } c.trim() } DLFU Cache

Slide 116

Slide 116 text

116 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) { c.mu.Lock() defer c.mu.Unlock() expiresAt := time.Now().Add(expiry) for key, value := range items { if item, ok := c.data[key]; ok { item.value = value item.expiresAt = now.Add(expiry) continue } item := &item{ key: key, value: value, score: c.incr, expiresAt: expiresAt, } c.data[key] = item heap.Push(&c.heap, item) } c.trim() } DLFU Cache

Slide 117

Slide 117 text

117 Building a Highly Concurrent Cache in Go func (c *lfuCache) Get(keys []string) (map[string]string, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok { result[key] = value frequency := item.frequency+1 c.heap.update(item, freqency) } else { missing = append(missing, key) } } return result, missing } DLFU Cache

Slide 118

Slide 118 text

118 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok { result[key] = value frequency := item.frequency+1 c.heap.update(item, freqency) } else { missing = append(missing, key) } } return result, missing } DLFU Cache

Slide 119

Slide 119 text

119 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value frequency := item.frequency+1 c.heap.update(item, freqency) } else { missing = append(missing, key) } } return result, missing } DLFU Cache

Slide 120

Slide 120 text

120 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } } return result, missing } DLFU Cache

Slide 121

Slide 121 text

121 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing } DLFU Cache

Slide 122

Slide 122 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /policy bench

Slide 123

Slide 123 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LRU │ DLFU │ │ hit/op │ hit/op vs base │ Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10) │ LRU │ DLFU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10) │ LRU │ DLFU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)

Slide 124

Slide 124 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LRU │ DLFU │ │ hit/op │ hit/op vs base │ Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10) │ LRU │ DLFU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10) │ LRU │ DLFU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)

Slide 125

Slide 125 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x -cpuprofile=cpu.out > bench benchstat -col /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LRU │ DLFU │ │ hit/op │ hit/op vs base │ Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10) │ LRU │ DLFU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10) │ LRU │ DLFU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)

Slide 126

Slide 126 text

126 Building a Highly Concurrent Cache in Go Profiling go tool pprof cpu.out File: cache.test Type: cpu Time: Sep 24, 2023 at 3:04pm (PDT) Duration: 850.60s, Total samples = 1092.33s (128.42%) Entering interactive mode (type "help" for commands, "o" for options) (pprof)

Slide 127

Slide 127 text

127 Building a Highly Concurrent Cache in Go Profiling go tool pprof cpu.out File: cache.test Type: cpu Time: Sep 24, 2023 at 3:04pm (PDT) Duration: 850.60s, Total samples = 1092.33s (128.42%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 567.44s, 51.95% of 1092.33s total Dropped 470 nodes (cum <= 5.46s) Showing top 10 nodes out of 104 flat flat% sum% cum cum% 118.69s 10.87% 10.87% 169.36s 15.50% runtime.findObject 88.04s 8.06% 18.93% 88.04s 8.06% github.com/konradreiche/cache/dlfu/v1.MinHeap[go.shape.string].Less 72.38s 6.63% 25.55% 319.24s 29.23% runtime.scanobject 60.74s 5.56% 31.11% 106.72s 9.77% runtime.mapaccess2_faststr 45.03s 4.12% 35.23% 126.25s 11.56% runtime.mapassign_faststr 40.60s 3.72% 38.95% 40.60s 3.72% time.Now 40.20s 3.68% 42.63% 41.13s 3.77% container/list.(*List).move 35.47s 3.25% 45.88% 35.47s 3.25% memeqbody 34.25s 3.14% 49.01% 34.25s 3.14% runtime.memclrNoHeapPointers 32.04s 2.93% 51.95% 44.34s 4.06% runtime.mapdelete_faststr

Slide 128

Slide 128 text

128 Building a Highly Concurrent Cache in Go Profiling go tool pprof cpu.out File: cache.test Type: cpu Time: Sep 24, 2023 at 3:04pm (PDT) Duration: 850.60s, Total samples = 1092.33s (128.42%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list DLFU.*Get

Slide 129

Slide 129 text

129 Building a Highly Concurrent Cache in Go Total: 1092.33s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get 6.96s 218.27s (flat, cum) 19.98% of Total . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) { . . 75: var missingKeys []string . 490ms 76: result := make(map[string]V) . . 77: 40ms 2.56s 78: c.mu.Lock() 30ms 30ms 79: defer c.mu.Unlock() . . 80: 1.04s 1.04s 81: for _, key := range keys { 1.34s 43.77s 82: item, ok := c.data[key] 290ms 53.66s 83: if ok && !item.expired() { 1.68s 44.35s 84: result[key] = item.value 1.72s 1.72s 85: item.score += c.incr 130ms 65.82s 86: c.heap.update(item, item.score) . . 87: } else { 530ms 3.55s 88: missingKeys = append(missingKeys, key) . . 89: } . . 90: } 20ms 20ms 91: c.incr *= c.decay 140ms 1.26s 92: return result, missingKeys . . 93:} . . 94:

Slide 130

Slide 130 text

130 Building a Highly Concurrent Cache in Go Total: 1092.33s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get 6.96s 218.27s (flat, cum) 19.98% of Total . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) { . . 75: var missingKeys []string . 490ms 76: result := make(map[string]V) . . 77: 40ms 2.56s 78: c.mu.Lock() 30ms 30ms 79: defer c.mu.Unlock() . . 80: 1.04s 1.04s 81: for _, key := range keys { 1.34s 43.77s 82: item, ok := c.data[key] 290ms 53.66s 83: if ok && !item.expired() { 1.68s 44.35s 84: result[key] = item.value 1.72s 1.72s 85: item.score += c.incr 130ms 65.82s 86: c.heap.update(item, item.score) . . 87: } else { 530ms 3.55s 88: missingKeys = append(missingKeys, key) . . 89: } . . 90: } 20ms 20ms 91: c.incr *= c.decay 140ms 1.26s 92: return result, missingKeys . . 93:} . . 94: Maintaining a heap is more expensive than LRU, which only requires a doubly linked list.

Slide 131

Slide 131 text

131 Building a Highly Concurrent Cache in Go Total: 1092.33s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get 6.96s 218.27s (flat, cum) 19.98% of Total . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) { . . 75: var missingKeys []string . 490ms 76: result := make(map[string]V) . . 77: 40ms 2.56s 78: c.mu.Lock() 30ms 30ms 79: defer c.mu.Unlock() . . 80: 1.04s 1.04s 81: for _, key := range keys { 1.34s 43.77s 82: item, ok := c.data[key] 290ms 53.66s 83: if ok && !item.expired() { 1.68s 44.35s 84: result[key] = item.value 1.72s 1.72s 85: item.score += c.incr 130ms 65.82s 86: c.heap.update(item, item.score) . . 87: } else { 530ms 3.55s 88: missingKeys = append(missingKeys, key) . . 89: } . . 90: } 20ms 20ms 91: c.incr *= c.decay 140ms 1.26s 92: return result, missingKeys . . 93:} . . 94: The CPU profile does not capture the time spent waiting to acquire a lock.

Slide 132

Slide 132 text

go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out

Slide 133

Slide 133 text

go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out File: cache.test Type: delay Time: Sep 24, 2023 at 3:48pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof)

Slide 134

Slide 134 text

go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out File: cache.test Type: delay Time: Sep 24, 2023 at 3:48pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list DLFU.*Get Total: 615.48s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get 0 297.55s (flat, cum) 48.34% of Total . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) { . . 75: var missingKeys []string . . 76: result := make(map[string]V) . . 77: . 297.55s 78: c.mu.Lock() . . 79: defer c.mu.Unlock() . . 80: . . 81: for _, key := range keys { . . 82: item, ok := c.data[key] . . 83: if ok && !item.expired() {

Slide 135

Slide 135 text

go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out File: cache.test Type: delay Time: Sep 24, 2023 at 3:48pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list DLFU.*Get Total: 615.48s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get 0 297.55s (flat, cum) 48.34% of Total . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) { . . 75: var missingKeys []string . . 76: result := make(map[string]V) . . 77: . 297.55s 78: c.mu.Lock() . . 79: defer c.mu.Unlock() . . 80: . . 81: for _, key := range keys { . . 82: item, ok := c.data[key] . . 83: if ok && !item.expired() {

Slide 136

Slide 136 text

go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out File: cache.test Type: delay Time: Sep 24, 2023 at 3:48pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list DLFU.*Set Total: 615.48s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Set 0 193.89s (flat, cum) 31.50% of Total . . 99:func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) { . 193.89s 100: c.mu.Lock() . . 101: defer c.mu.Unlock() . . 102: . . 103: now := time.Now() . . 104: for key, value := range items { . . 105: if ctx.Err() != nil {

Slide 137

Slide 137 text

137 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing } Critical Section Critical Section

Slide 138

Slide 138 text

138 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing }

Slide 139

Slide 139 text

139 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing }

Slide 140

Slide 140 text

140 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { c.mu.Lock() if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } c.mu.Unlock() } c.mu.Lock() c.incr *= c.decay c.mu.Unlock() return result, missing }

Slide 141

Slide 141 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /dlfu bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ V1 │ V2 │ │ hit/op │ hit/op vs base │ Cache-16 70.94% ± 0% 70.30% ± 0% -0.89% (p=0.000 n=10) │ V1 │ V2 │ │ read-sec/op │ read-sec/op vs base │ Cache-16 4.34ms ± 54% 1.43ms ± 26% -67.13% (p=0.001 n=10) │ V1 │ V2 │ │ write-sec/op │ write-sec/op vs base │ Cache-16 2.43ms ± 62% 574.3µs ± 25% -76.36% (p=0.000 n=10)

Slide 142

Slide 142 text

In Production

Slide 143

Slide 143 text

In Production

Slide 144

Slide 144 text

● Spike in number of goroutines, memory usage & timeouts In Production

Slide 145

Slide 145 text

● Spike in number of goroutines, memory usage & timeouts ● Latency added to call paths integrating the local in-memory cache In Production

Slide 146

Slide 146 text

● Spike in number of goroutines, memory usage & timeouts ● Latency added to call paths integrating the local in-memory cache ● For incremental progress: ○ Feature Flags with Sampling ○ Timeout for Cache Operations In Production

Slide 147

Slide 147 text

cached, missingIDs := cache.Get(keys) In Production

Slide 148

Slide 148 text

if !liveconfig.Sample("cache.read_rate") { return } cached, missingIDs := cache.Get(keys) In Production: Feature Flags with Sampling

Slide 149

Slide 149 text

if !liveconfig.Sample("cache.read_rate") { return } go func() { cached, missingIDs = localCache.Get(keys) }() In Production: Timeout for Cache Operations

Slide 150

Slide 150 text

if !liveconfig.Sample("cache.read_rate") { return } // perform cache-lookup in goroutine to avoid blocking for too long ctx, cancel := context.WithTimeout(context.Background(), 5*time.Millisecond) go func() { cached, missingIDs = localCache.Get(keys) cancel() }() <-ctx.Done() // timeout: return all keys as missing and let remote cache handle it if ctxCache.Err() == context.DeadlineExceeded { return map[string]T{}, keys } In Production: Timeout for Cache Operations

Slide 151

Slide 151 text

if !liveconfig.Sample("cache.read_rate") { return } // perform cache-lookup in goroutine to avoid blocking for too long ctx, cancel := context.WithTimeout(context.Background(), 5*time.Millisecond) go func() { cached, missingIDs = localCache.Get(keys) cancel() }() <-ctx.Done() // timeout: return all keys as missing and let remote cache handle it if ctxCache.Err() == context.DeadlineExceeded { return map[string]T{}, keys } In Production: Timeout for Cache Operations Pass context into the cache operations too.

Slide 152

Slide 152 text

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { c.mu.Lock() if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } c.mu.Unlock() } c.mu.Lock() c.incr *= c.decay c.mu.Unlock() return result, missing } In Production: Timeout for Cache Operations

Slide 153

Slide 153 text

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { c.mu.Lock() if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } c.mu.Unlock() } c.mu.Lock() c.incr *= c.decay c.mu.Unlock() return result, missing } In Production: Timeout for Cache Operations

Slide 154

Slide 154 text

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { c.mu.Lock() if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } c.mu.Unlock() } c.mu.Lock() c.incr *= c.decay c.mu.Unlock() return result, missing } In Production: Timeout for Cache Operations The goroutine gets abandoned after the timeout. Checking for context cancellation to stop iteration helps to reduce lock contention.

Slide 155

Slide 155 text

Beyond sync.Mutex

Slide 156

Slide 156 text

156 sync.Map Building a Highly Concurrent Cache in Go

Slide 157

Slide 157 text

157 sync.Map ● I wrongly assumed sync.Map is an untyped map proteced by a sync.RWMutex. ● I recommend to frequently dive into the standard library source code, for example for sync.Map we can see this implementation is much more intricate: Building a Highly Concurrent Cache in Go func (m *Map) Load(key any) (value any, ok bool) { read := m.loadReadOnly() e, ok := read.m[key] if !ok && read.amended { m.mu.Lock() // Avoid reporting a spurious miss if m.dirty got promoted while we were // blocked on m.mu. (If further loads of the same key will not miss, it's // not worth copying the dirty map for this key.) read = m.loadReadOnly() e, ok = read.m[key]

Slide 158

Slide 158 text

158 sync.Map Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination. Loads, stores, and deletes run in amortized constant time. The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content. The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex. Building a Highly Concurrent Cache in Go

Slide 159

Slide 159 text

159 sync.Map Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination. Loads, stores, and deletes run in amortized constant time. The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content. The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex. Building a Highly Concurrent Cache in Go

Slide 160

Slide 160 text

160 sync.Map Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple goroutines without additional locking or coordination. Loads, stores, and deletes run in amortized constant time. The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content. The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex. Building a Highly Concurrent Cache in Go

Slide 161

Slide 161 text

161 xsync.Map ● Third-party library providing concurrent data structures for Go https://github.com/puzpuzpuz/xsync ● xsync.Map is a concurrent hash table based map using a modified version of Cache-Line Hash Table (CLHT) data structure. ● CLHT organizes the hash table in cache-line-sized buckets to reduce the number of cache-line transfers. Building a Highly Concurrent Cache in Go

Slide 162

Slide 162 text

162 xsync.Map ● Third-party library providing concurrent data structures for Go https://github.com/puzpuzpuz/xsync ● xsync.Map is a concurrent hash table based map using a modified version of Cache-Line Hash Table (CLHT) data structure. ● CLHT organizes the hash table in cache-line-sized buckets to reduce the number of cache-line transfers. Building a Highly Concurrent Cache in Go

Slide 163

Slide 163 text

163 Symmetric Multiprocessing (SMP) CPU Core L1 Core L1 Core L1 Core L1 L3 Main Memory (RAM) L2 L2 L2 L2 Building a Highly Concurrent Cache in Go

Slide 164

Slide 164 text

164 Locality of Reference Building a Highly Concurrent Cache in Go ● Temporal Locality: a processor accessing a particular memory location will likely access it again in the near future. ● Spatial Locality: a processing accessing a particular memory location will access memory locations nearby. ● Not one memory location is copied to the CPU cache, but a cache line. ● Cache Line (Cache Block): adjacent chunk of memory.

Slide 165

Slide 165 text

165 False Sharing CPU Core 1 L1 CPU Core 2 L1 Main Memory var x var y Building a Highly Concurrent Cache in Go

Slide 166

Slide 166 text

166 False Sharing CPU Core 1 L1 CPU Core 2 L1 Main Memory var x var y Building a Highly Concurrent Cache in Go Read variable x into cache

Slide 167

Slide 167 text

167 False Sharing Main Memory CPU Core 1 L1 CPU Core 2 L1 var x var y var x var y Building a Highly Concurrent Cache in Go E

Slide 168

Slide 168 text

168 False Sharing Main Memory CPU Core 2 L1 var x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var x var y E Read variable y into cache

Slide 169

Slide 169 text

169 False Sharing Building a Highly Concurrent Cache in Go CPU Core 2 L1 Main Memory var x var y CPU Core 1 L1 var y var x S var x var y S

Slide 170

Slide 170 text

170 False Sharing Main Memory CPU Core 2 L1 var x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var x var y M var y var x S Write to variable x

Slide 171

Slide 171 text

171 False Sharing Main Memory CPU Core 2 L1 var x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var x var y M var y var x S Invalidate cache line

Slide 172

Slide 172 text

172 False Sharing Building a Highly Concurrent Cache in Go Main Memory var x var y CPU Core 1 L1 var x var y M CPU Core 2 L1

Slide 173

Slide 173 text

173 False Sharing Main Memory var x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var x var y M CPU Core 2 L1 Write results in coherence miss Invalidate cache line

Slide 174

Slide 174 text

174 False Sharing Building a Highly Concurrent Cache in Go CPU Core 1 L1 Main Memory var x var y CPU Core 2 L1 var y var x M Coherence Write-Back

Slide 175

Slide 175 text

175 False Sharing Building a Highly Concurrent Cache in Go CPU Core 1 L1 Main Memory var x var y CPU Core 2 L1 var y var x M

Slide 176

Slide 176 text

176 False Sharing Building a Highly Concurrent Cache in Go Read results in coherence miss CPU Core 1 L1 Main Memory var x var y CPU Core 2 L1 var y var x M Invalidate cache line

Slide 177

Slide 177 text

177 False Sharing CPU Core 2 L1 Main Memory var x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var y var x M Coherence Write-Back

Slide 178

Slide 178 text

178 False Sharing CPU Core 2 L1 Main Memory var x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var y var x S var x var y S

Slide 179

Slide 179 text

179 Cache Coherence Protocol Building a Highly Concurrent Cache in Go ● Ensure CPU cores have a consistent view of the same data. ● Added coordination between CPU cores impacts application performance. ● Reducing the need for cache coherence will make for faster Go applications.

Slide 180

Slide 180 text

180 xsync.Map ● Third-party library providing concurrent data structures for Go https://github.com/puzpuzpuz/xsync ● xsync.Map is a concurrent hash table based map using a modified version of Cache-Line Hash Table (CLHT) data structure. ● CLHT organizes the hash table in cache-line-sized buckets to reduce the number of cache-line transfers. Building a Highly Concurrent Cache in Go

Slide 181

Slide 181 text

181 DLFU Cache: Removing Locking ● Maintaining the heap still requires a mutex. ● To fully leverage xsync.Map we would want to eliminate the mutex. ● Perform cache eviction in a goroutine: collect all entries and sort them. ● Replace synchronized access to numeric or string types in Go with atomic operations. ● It’s not really lock-free: move locking closer to the CPU. Building a Highly Concurrent Cache in Go

Slide 182

Slide 182 text

182 Building a Highly Concurrent Cache in Go func (c *DLFUCache[V]) trimmer(ctx context.Context) { for { select { case <-ctx.Done(): return case <-time.After(250 * time.Millisecond): if ctx.Err() != nil { return } c.trim() } } } Perform Cache Eviction Asynchronously

Slide 183

Slide 183 text

func (c *DLFUCache[V]) trim() { size := c.data.Size() if size <= c.size { return } items := make(items[V], 0, size) c.data.Range(func(key string, value *item[V]) bool { items = append(items, value) return true }) sort.Sort(items) for i := 0; i < len(items)-c.size; i++ { c.data.Delete(items[i].key.Load()) } } 183 Building a Highly Concurrent Cache in Go Perform Cache Eviction Asynchronously

Slide 184

Slide 184 text

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { if ctx.Err() != nil { return result, append(keys[i:], missingKeys...) } c.mu.Lock() if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } c.mu.Unlock() } c.mu.Lock() c.incr *= c.decay l.mu.Unlock() return result, missing } Integrate xsync.Map

Slide 185

Slide 185 text

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { if ctx.Err() != nil { return result, append(keys[i:], missingKeys...) } if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing } Integrate xsync.Map

Slide 186

Slide 186 text

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { if ctx.Err() != nil { return result, append(keys[i:], missingKeys...) } if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing } Integrate xsync.Map

Slide 187

Slide 187 text

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { if ctx.Err() != nil { return result, append(keys[i:], missingKeys...) } if item, ok := c.data.Load(key); ok && !item.expired() { result[key] = value item.score += c.incr } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing } Integrate xsync.Map

Slide 188

Slide 188 text

func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) incr := c.incr.Load() for _, key := range keys { if ctx.Err() != nil { return result, append(keys[i:], missingKeys...) } if item, ok := c.data.Load(key); ok && !item.expired() { result[key] = value item.score.Add(incr) } else { missing = append(missing, key) } } c.incr.Store(incr * c.decay) return result, missing } Integrate xsync.Map

Slide 189

Slide 189 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /dlfu bench

Slide 190

Slide 190 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /dlfu bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ V2 │ V3 │ │ hit/op │ hit/op vs base │ Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10) │ V2 │ V3 │ │ read-sec/op │ read-sec/op vs base │ Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10) │ V2 │ V3 │ │ trim-sec/op │ trim-sec/op vs base │ Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10) │ V2 │ V3 │ │ write-sec/op │ write-sec/op vs base │ Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)

Slide 191

Slide 191 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /dlfu bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ V2 │ V3 │ │ hit/op │ hit/op vs base │ Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10) │ V2 │ V3 │ │ read-sec/op │ read-sec/op vs base │ Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10) │ V2 │ V3 │ │ trim-sec/op │ trim-sec/op vs base │ Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10) │ V2 │ V3 │ │ write-sec/op │ write-sec/op vs base │ Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)

Slide 192

Slide 192 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /dlfu bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ V2 │ V3 │ │ hit/op │ hit/op vs base │ Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10) │ V2 │ V3 │ │ read-sec/op │ read-sec/op vs base │ Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10) │ V2 │ V3 │ │ trim-sec/op │ trim-sec/op vs base │ Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10) │ V2 │ V3 │ │ write-sec/op │ write-sec/op vs base │ Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)

Slide 193

Slide 193 text

go tool pprof cpu.out File: cache.test Type: cpu Time: Sep 26, 2023 at 1:16pm (PDT) Duration: 42.14s, Total samples = 81.68s (193.83%) Entering interactive mode (type "help" for commands, "o" for options) (pprof)

Slide 194

Slide 194 text

go tool pprof cpu.out File: cache.test Type: cpu Time: Sep 26, 2023 at 1:16pm (PDT) Duration: 42.14s, Total samples = 81.68s (193.83%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list trim Total: 81.68s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v3.(*DLFUCache[go.shape.string]).trim 20ms 37.23s (flat, cum) 45.58% of Total . . 197:func (c *DLFUCache[V]) Trim() { . 20ms 198: size := c.data.Size() . 10ms 199: if c.data.Size() <= c.size { . . 200: return . . 201: } . . 202: . 80ms 203: items := make(*items[V], 0, size) . 6.82s 204: c.data.Range(func(key string, value *item[V]) bool { . . 205: items = append(items, value) . . 206: return true . . 207: }) . 26.98s 208: sort.Sort(items) . . 209: 10ms 10ms 210: for i := 0; i < len(items)-c.size; i++ { 10ms 680ms 211: key := items[i].key.Load() . 2.63s 212: c.data.Delete(key) . . 213: } . . 214:}

Slide 195

Slide 195 text

go tool pprof cpu.out File: cache.test Type: cpu Time: Sep 26, 2023 at 1:16pm (PDT) Duration: 42.14s, Total samples = 81.68s (193.83%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list trim Total: 81.68s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v3.(*DLFUCache[go.shape.string]).trim 20ms 37.23s (flat, cum) 45.58% of Total . . 197:func (c *DLFUCache[V]) Trim() { . 20ms 198: size := c.data.Size() . 10ms 199: if c.data.Size() <= c.size { . . 200: return . . 201: } . . 202: . 80ms 203: items := make(*items[V], 0, size) . 6.82s 204: c.data.Range(func(key string, value *item[V]) bool { . . 205: items = append(items, value) . . 206: return true . . 207: }) . 26.98s 208: sort.Sort(items) . . 209: 10ms 10ms 210: for i := 0; i < len(items)-c.size; i++ { 10ms 680ms 211: key := items[i].key.Load() . 2.63s 212: c.data.Delete(key) . . 213: } . . 214:}

Slide 196

Slide 196 text

196 Faster Eviction Building a Highly Concurrent Cache in Go ● sort.Sort uses pattern-defeating quicksort (pdqsort) ● On the Gophers Slack in #performance Aurélien Rainone suggested to use quickselect. ● Quickselect is a linear algorithm to find the k-th smallest elements.

Slide 197

Slide 197 text

go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col /dlfu bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ V4 │ V5 │ │ hit/op │ hit/op vs base │ Cache-16 76.44% ± 1% 74.84% ± 1% -2.09% (p=0.001 n=10) │ V4 │ V5 │ │ read-sec/op │ read-sec/op vs base │ Cache-16 477.5µ ± 40% 358.4µ ± 44% ~ (p=0.529 n=10) │ V4 │ V5 │ │ trim-sec/op │ trim-sec/op vs base │ Cache-16 463.3m ± 54% 129.1m ± 85% -72.14% (p=0.002 n=10) │ V4 │ V5 │ │ write-sec/op │ write-sec/op vs base │ Cache-16 193.2µ ± 53% 133.6µ ± 40% ~ (p=0.280 n=10)

Slide 198

Slide 198 text

198 Summary ● Implementing your own cache in Go makes it possible to optimize by leveraging properties that are unique to your use case. ● Different cache replacement policies: LRU, LFU, DLFU, etc. ● DLFU (Decaying Least Frequently Used): like LFU but with exponential decay on the cache entry’s reference count. ● How to write benchmarks and utilize parallel execution for concurrency. ● Using Go’s profiler to optimize for concurrency contention. Building a Highly Concurrent Cache in Go

Slide 199

Slide 199 text

199 Summary ● Cache coherency protocol can impact concurrent performance in Go applications ● There is no such thing as lock-free when multiple processors are involved. ● Performance can be improved with lock-free data structures and atomic primitives, but your mileage will differ. Building a Highly Concurrent Cache in Go

Slide 200

Slide 200 text

“ “ Don’t generalize from the talk’s example. Write your own code, construct your own benchmarks. You will be surprised. 200 Building a Highly Concurrent Cache in Go

Slide 201

Slide 201 text

Thank you! Konrad Reiche @konradreiche