Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Highly Concurrent Cache in Go: A Hit...

Konrad Reiche
September 27, 2023

Building a Highly Concurrent Cache in Go: A Hitchhiker's Guide

Why doesn't the Go standard library provide a concurrent cache? Because Go emphasizes building custom data structures that fit your needs. In this talk, you will learn how to design, implement, and optimize a concurrent cache in Go, combing LRU & LFU eviction policy, and advanced concurrency patterns beyond sync.Mutex.

GopherCon 2023, San Diego
https://www.youtube.com/watch?v=vT5zI6-sKe8

Konrad Reiche

September 27, 2023
Tweet

More Decks by Konrad Reiche

Other Decks in Programming

Transcript

  1. “ “ In its full generality, multithreading is an incredibly

    complex and error-prone technique, not to be recommended in any but the smallest programs. ― C. A. R. Hoare: Communicating Sequential Processes
  2. 3 Building a Highly Concurrent Cache in Go Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  3. 4 Building a Highly Concurrent Cache in Go Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  4. 5 Building a Highly Concurrent Cache in Go Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  5. 6 Building a Highly Concurrent Cache in Go Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  6. 7 Building a Highly Concurrent Cache in Go Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  7. 8 Building a Highly Concurrent Cache in Go Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  8. 9 Building a Highly Concurrent Cache in Go Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  9. 10 Building a Highly Concurrent Cache in Go Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  10. 11 Building a Highly Concurrent Cache in Go Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  11. Building a Highly Concurrent Cache in Go 12 Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  12. Building a Highly Concurrent Cache in Go 13 Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  13. Building a Highly Concurrent Cache in Go 14 Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  14. Building a Highly Concurrent Cache in Go 15 Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  15. Building a Highly Concurrent Cache in Go 16 Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  16. Building a Highly Concurrent Cache in Go 17 Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  17. Building a Highly Concurrent Cache in Go 18 Adapted from

    images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY
  18. “ “ Don’t panic. 22 Building a Highly Concurrent Cache

    in Go ― Effective Go ― Douglas Adams: The Hitchhiker's Guide to the Galaxy
  19. 24 What is a Cache? Software component that stores data

    so that future requests for the data can be served faster. Building a Highly Concurrent Cache in Go
  20. 25 What is a Cache? Software component that stores data

    so that future requests for the data can be served faster. Remember the result of an expensive operation, to speed up reads. Building a Highly Concurrent Cache in Go
  21. 26 What is a Cache? Software component that stores data

    so that future requests for the data can be served faster. Remember the result of an expensive operation, to speed up reads. Building a Highly Concurrent Cache in Go data, ok := cache.Get() if !ok { data := doSomething() cache.Set(data) }
  22. 30 Caching Post Data for Ranking Building a Highly Concurrent

    Cache in Go Kubernetes Pods for Ranking Service
  23. 31 Building a Highly Concurrent Cache in Go Kubernetes Pods

    for Ranking Service Thing Service Get data for filtering posts Caching Post Data for Ranking
  24. 32 Building a Highly Concurrent Cache in Go Kubernetes Pods

    for Ranking Service Thing Service Get data for filtering posts Redis Cache Cluster Caching Post Data for Ranking
  25. 33 Building a Highly Concurrent Cache in Go Kubernetes Pods

    for Ranking Service Thing Service Get data for filtering posts Redis Cache Cluster Cache Lookups Caching Post Data for Ranking Update on Miss
  26. 34 Building a Highly Concurrent Cache in Go Kubernetes Pods

    for Ranking Service Thing Service Get data for filtering posts Redis Cache Cluster Caching Post Data for Ranking Cache Lookups 5-20m op/sec Update on Miss
  27. 35 Building a Highly Concurrent Cache in Go Kubernetes Pods

    for Ranking Service Thing Service Get data for filtering posts Redis Cache Cluster Caching Post Data for Ranking Local In-Memory Cache with ~90k op/sec per pod Cache Lookups 5-20m op/sec Update on Miss
  28. 37 type cache[K comparable, V any] struct { data map[K]V

    } Building a Highly Concurrent Cache in Go
  29. 38 type cache struct { data map[string]string } Building a

    Highly Concurrent Cache in Go We can introduce generics once needed to keep it simple.
  30. 39 type cache struct { data map[string]string } func (c

    *cache) Set(key, value string) { c.data[key] = value } func (c *cache) Get(key string) (string, bool) { value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go
  31. 40 type cache struct { mu sync.Mutex data map[string]string }

    func (c *cache) Set(key, value string) { c.data[key] = value } func (c *cache) Get(key string) (string, bool) { value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go
  32. 41 type cache struct { mu sync.Mutex data map[string]string }

    func (c *cache) Set(key, value string) { c.data[key] = value } func (c *cache) Get(key string) (string, bool) { value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go
  33. 42 type cache struct { mu sync.Mutex data map[string]string }

    func (c *cache) Set(key, value string) { c.mu.Lock() defer c.mu.Unlock() c.data[key] = value } func (c *cache) Get(key string) (string, bool) { c.mu.Lock() defer c.mu.Unlock() value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go
  34. 43 type cache struct { mu sync.RWMutex data map[string]string }

    func (c *cache) Set(key, value string) { c.mu.Lock() defer c.mu.Unlock() c.data[key] = value } func (c *cache) Get(key string) (string, bool) { c.mu.RLock() defer c.mu.RUnlock() value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go
  35. 44 type cache struct { mu sync.RWMutex data map[string]string }

    func (c *cache) Set(key, value string) { c.mu.Lock() defer c.mu.Unlock() c.data[key] = value } func (c *cache) Get(key string) (string, bool) { c.mu.RLock() defer c.mu.RUnlock() value, ok := c.data[key] return value, ok } Building a Highly Concurrent Cache in Go Don’t generalize the cache from the start; pick an API that maximizes your usage pattern.
  36. 45 type cache struct { mu sync.RWMutex data map[string]string }

    func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { c.data[key] = value } } func (c *cache) Get(keys []string) map[string]string { result := make(map[string]string) c.mu.RLock() defer c.mu.RUnlock() for _, key := range keys { if value, ok := c.data[key]; ok { result[key] = value } } return result } Building a Highly Concurrent Cache in Go
  37. 48 Once the cache exceeds its maximum capacity, which data

    should be evicted to make space for new data? Building a Highly Concurrent Cache in Go Cache Replacement Policies Cache Entry 1 Entry 2 Entry 3 ⋮ Entry n
  38. 49 Once the cache exceeds its maximum capacity, which data

    should be evicted to make space for new data? Bélády's Optimal Replacement Algorithm Remove the entry whose next use will occur farthest in the future. Building a Highly Concurrent Cache in Go Cache Replacement Policies Cache Entry 1 Entry 2 Entry 3 ⋮ Entry n
  39. 50 Once the cache exceeds its maximum capacity, which data

    should be evicted to make space for new data? Bélády's Optimal Replacement Algorithm Remove the entry whose next use will occur farthest in the future. Building a Highly Concurrent Cache in Go Cache Replacement Policies Because we cannot predict the future, we can only try to approximate this behavior. Cache Entry 1 Entry 2 Entry 3 ⋮ Entry n
  40. 51 Taxonomy of Cache Replacement Policies Building a Highly Concurrent

    Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification Longer History More Access Patterns [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
  41. 52 Taxonomy of Cache Replacement Policies Building a Highly Concurrent

    Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification Longer History More Access Patterns [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019. LRU LFU
  42. 53 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently.
  43. 54 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. type cache struct { size int mu sync.RWMutex data map[string]string }
  44. 55 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. type cache struct { size int mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string frequency int index int }
  45. 56 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } }
  46. 57 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } } a frequency: 1
  47. 58 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } } a frequency: 1 b frequency: 1
  48. 59 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. a frequency: 1 b frequency: 1 c frequency: 1 func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } }
  49. 60 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Get(keys []string) ( map[string]string, []string, ) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok { result[key] = item.value frequency := item.frequency+1 c.heap.update(item, frequency) } else { missing = append(missing, key) } } return result, missing } a frequency: 1 b frequency: 1 c frequency: 1
  50. 61 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Get(keys []string) ( map[string]string, []string, ) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok { result[key] = item.value frequency := item.frequency+1 c.heap.update(item, frequency) } else { missing = append(missing, key) } } return result, missing } a frequency: 1 b frequency: 1 c frequency: 1 cache.Get([]string{"a", "b"})
  51. 62 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Get(keys []string) ( map[string]string, []string, ) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok { result[key] = item.value frequency := item.frequency+1 c.heap.update(item, frequency) } else { missing = append(missing, key) } } return result, missing } c frequency: 1 a frequency: 2 b frequency: 2
  52. 63 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } } c frequency: 1 a frequency: 2 b frequency: 2 cache.Set(map[string]string{ "d": "⌯Go", })
  53. 64 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. c frequency: 1 d frequency: 1 b frequency: 2 a frequency: 2 func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } }
  54. 65 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. c frequency: 1 d frequency: 1 b frequency: 2 a frequency: 2 func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } }
  55. 66 Building a Highly Concurrent Cache in Go LFU Least

    Frequently Used (LFU) Favor entries that are used frequently. d frequency: 1 a frequency: 2 b frequency: 2 func (c *cache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } }
  56. 67 Taxonomy of Cache Replacement Policies Building a Highly Concurrent

    Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification Longer History More Access Patterns [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019. LRU LFU
  57. 68 Taxonomy of Cache Replacement Policies Building a Highly Concurrent

    Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification LRU EELRU SegLRU LIP SRRIP LFU FBR 2Q ARC LRFU DIP DRRIP EVA Timekeeping AIP ETA Leeway DBCP EAF SDBP SHiP Hawkeye [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
  58. 69 Building a Highly Concurrent Cache in Go Kubernetes Pods

    for Ranking Service Thing Service Get data for filtering posts Redis Cache Cluster Caching Post Data for Ranking Local In-Memory Cache with ~90k op/sec per pod Cache Lookups 5-20m op/sec Update on Miss
  59. Benchmarks Functions of the form func BenchmarkXxx(*testing.B) are considered benchmarks,

    and are executed by the go test command when its -bench flag is provided. B is a type passed to Benchmark functions to manage benchmark timing and to specify the number of iterations to run.
  60. Benchmarks • Before improving the performance of code, we should

    measure its current performance • Create a stable environment ◦ Idle machine ◦ No shared hardware ◦ Don’t browse the web ◦ Power saving, thermal scaling • The testing package has built-in support for writing benchmarks
  61. Benchmarks func BenchmarkGet(b *testing.B) { cache := NewLFUCache(100) keys :=

    make([]string, 100) items := make(map[string]string) for i := 0; i < 100; i++ { kv := fmt.Sprint(i) keys[i] = kv items[kv] = kv } for i := 0; i < b.N; i++ { } }
  62. Benchmarks func BenchmarkGet(b *testing.B) { cache := NewLFUCache(100) keys :=

    make([]string, 100) items := make(map[string]string) for i := 0; i < 100; i++ { kv := fmt.Sprint(i) keys[i] = kv items[kv] = kv } cache.Set(items) b.ResetTimer() for i := 0; i < b.N; i++ { } }
  63. Benchmarks func BenchmarkGet(b *testing.B) { cache := NewLFUCache(100) keys :=

    make([]string, 100) items := make(map[string]string) for i := 0; i < 100; i++ { kv := fmt.Sprint(i) keys[i] = kv items[kv] = kv } cache.Set(items) b.ResetTimer() for i := 0; i < b.N; i++ { cache.Get(keys) } }
  64. go test -run=^$ -bench=BenchmarkGet -count=5 goos: linux goarch: amd64 pkg:

    github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz BenchmarkGet-16 117642 9994 ns/op BenchmarkGet-16 116402 10018 ns/op BenchmarkGet-16 121834 9817 ns/op BenchmarkGet-16 123241 9942 ns/op BenchmarkGet-16 109621 10022 ns/op PASS ok github.com/konradreiche/cache 6.520s Benchmarks
  65. Benchmarks func BenchmarkGet(b *testing.B) { cache := NewLFUCache(100) keys :=

    make([]string, 100) items := make(map[string]string) for i := 0; i < 100; i++ { kv := fmt.Sprint(i) keys[i] = kv items[kv] = kv } cache.Set(items) b.ResetTimer() for i := 0; i < b.N; i++ { cache.Get(keys) } }
  66. Limitations • We want to analyze and optimize all cache

    operations: Get, Set, Eviction • Not all code paths are covered • What cache hit and miss ratio to benchmark for? • No concurrency, benchmark executes with one goroutine • How do different operations behave when interleaving concurrently?
  67. Limitations • We want to analyze and optimize all cache

    operations: Get, Set, Eviction • Not all code paths are covered • What cache hit and miss ratio to benchmark for? • No concurrency, benchmark executes with one goroutine • How do different operations behave when interleaving concurrently?
  68. Real Sample Data Event log of cache access over 30

    minutes including: • timestamp • posts keys 107907533,SA,Lw,OA,Iw,aA,RA,KA,CQ,Ow,Aw,Hg,Kg 111956832,upgb 121807061,upgb 134028958,l3Ir,iPMq,PcUn,T5Ej,ZQs,kTM,/98F,BFwJ,Oik,uYIB,gv8F 137975373,crgb,SCMU,NXUd,EyQI,244Z,DB4H,Tp0H,Kh8b,gREH,g9kG,o34E,wSYI,u+wF,h40M 142509895,iwM,hgM,CQQ,YQI 154850130,jTE,ciU,2U4,GQkB,4xo,U2QC,/7oB,dRIC,M0gB,bwYk ...
  69. Limitations • We want to analyze and optimize all cache

    operations: Get, Set, Eviction • Not all code paths are covered • What cache hit and miss ratio to benchmark for? • No concurrency, benchmark executes with one goroutine • How do different operations behave when interleaving concurrently?
  70. b.RunParallel RunParallel runs a benchmark in parallel. It creates multiple

    goroutines and distributes b.N iterations among them. The number of goroutines defaults to GOMAXPROCS.
  71. b.RunParallel RunParallel runs a benchmark in parallel. It creates multiple

    goroutines and distributes b.N iterations among them. The number of goroutines defaults to GOMAXPROCS. func BenchmarkCache(b *testing.B) { b.RunParallel(func(pb *testing.PB) { // set up goroutine local state for pb.Next() { // execute one iteration of the benchmark } }) }
  72. func BenchmarkCache(b *testing.B) { cb := newBenchmarkCase(b, config{size: 400_000}) b.ResetTimer()

    b.RunParallel(func(pb *testing.PB) { var br benchmarkResult for pb.Next() { log := cb.nextEventLog() start := time.Now() cached, missing := cb.cache.Get(log.keys) br.observeGet(start, cached, missing) if len(missing) > 0 { data := lookupData(missing) start := time.Now() cb.cache.Set(data) br.observeSetDuration(start) } } cb.addLocalReports(br) }) b.ReportMetric(cb.getHitRate(), "hit/op") b.ReportMetric(cb.getTimePerGet(b), "read-ns/op") b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") }
  73. func BenchmarkCache(b *testing.B) { cb := newBenchmarkCase(b, config{size: 400_000}) b.ResetTimer()

    b.RunParallel(func(pb *testing.PB) { var br benchmarkResult for pb.Next() { log := cb.nextEventLog() start := time.Now() cached, missing := cb.cache.Get(log.keys) br.observeGet(start, cached, missing) if len(missing) > 0 { data := lookupData(missing) start := time.Now() cb.cache.Set(data) br.observeSetDuration(start) } } cb.addLocalReports(br) }) b.ReportMetric(cb.getHitRate(), "hit/op") b.ReportMetric(cb.getTimePerGet(b), "read-ns/op") b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") } Custom benchmark case type to manage benchmark and collect data
  74. func BenchmarkCache(b *testing.B) { cb := newBenchmarkCase(b, config{size: 400_000}) b.ResetTimer()

    b.RunParallel(func(pb *testing.PB) { var br benchmarkResult for pb.Next() { log := cb.nextEventLog() start := time.Now() cached, missing := cb.cache.Get(log.keys) br.observeGet(start, cached, missing) if len(missing) > 0 { data := lookupData(missing) start := time.Now() cb.cache.Set(data) br.observeSetDuration(start) } } cb.addLocalReports(br) }) b.ReportMetric(cb.getHitRate(), "hit/op") b.ReportMetric(cb.getTimePerGet(b), "read-ns/op") b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") } Collect per-goroutine benchmark measurements
  75. func BenchmarkCache(b *testing.B) { cb := newBenchmarkCase(b, config{size: 400_000}) b.ResetTimer()

    b.RunParallel(func(pb *testing.PB) { var br benchmarkResult for pb.Next() { log := cb.nextEventLog() start := time.Now() cached, missing := cb.cache.Get(log.keys) br.observeGet(start, cached, missing) if len(missing) > 0 { data := lookupData(missing) start := time.Now() cb.cache.Set(data) br.observeSetDuration(start) } } cb.addLocalReports(br) }) b.ReportMetric(cb.getHitRate(), "hit/op") b.ReportMetric(cb.getTimePerGet(b), "read-ns/op") b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") } Reproduce production behavior: lookup & update.
  76. func BenchmarkCache(b *testing.B) { cb := newBenchmarkCase(b, config{size: 400_000}) b.ResetTimer()

    b.RunParallel(func(pb *testing.PB) { var br benchmarkResult for pb.Next() { log := cb.nextEventLog() start := time.Now() cached, missing := cb.cache.Get(log.keys) br.observeGet(start, cached, missing) if len(missing) > 0 { data := lookupData(missing) start := time.Now() cb.cache.Set(data) br.observeSetDuration(start) } } cb.addLocalReports(br) }) b.ReportMetric(cb.getHitRate(), "hit/op") b.ReportMetric(cb.getTimePerGet(b), "read-ns/op") b.ReportMetric(cb.getTimePerSet(b), "write-ns/op") } We can measure duration of individual operations manually Use b.ReportMetric to report custom metrics
  77. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x Ensure that each benchmark

    processes exactly 5,000 event logs to improve comparability of hit rate metric
  78. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x goos: linux goarch: amd64

    pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz BenchmarkCache/policy=lfu-16 5000 0.6141 hit/op 4795215 read-ns/op 2964262 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6082 hit/op 4686778 read-ns/op 2270200 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6159 hit/op 4332358 read-ns/op 1765885 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6153 hit/op 4089562 read-ns/op 2504176 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6152 hit/op 3472677 read-ns/op 1686928 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6107 hit/op 4464410 read-ns/op 2695443 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6155 hit/op 3624802 read-ns/op 1837148 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6133 hit/op 3931610 read-ns/op 2154571 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6151 hit/op 2440746 read-ns/op 1260662 write-ns/op BenchmarkCache/policy=lfu-16 5000 0.6138 hit/op 3491091 read-ns/op 1944350 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6703 hit/op 2320270 read-ns/op 1127495 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6712 hit/op 2212118 read-ns/op 1019305 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6705 hit/op 2150089 read-ns/op 1037654 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6703 hit/op 2512224 read-ns/op 1134282 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6710 hit/op 2377883 read-ns/op 1079198 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6711 hit/op 2313210 read-ns/op 1120761 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6712 hit/op 2071632 read-ns/op 980912 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2410096 read-ns/op 1127907 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2226160 read-ns/op 1071007 write-ns/op BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2383321 read-ns/op 1165734 write-ns/op PASS ok github.com/konradreiche/cache 846.442s
  79. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col

    /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LFU │ LRU │ │ hit/op │ hit/op vs base │ Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10) │ LFU │ LRU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10) │ LFU │ LRU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)
  80. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col

    /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LFU │ LRU │ │ hit/op │ hit/op vs base │ Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10) │ LFU │ LRU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10) │ LFU │ LRU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)
  81. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col

    /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LFU │ LRU │ │ hit/op │ hit/op vs base │ Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10) │ LFU │ LRU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10) │ LFU │ LRU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)
  82. 100 Taxonomy of Cache Replacement Policies Building a Highly Concurrent

    Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019. LRU EELRU SegLRU LIP SRRIP LFU FBR 2Q ARC LRFU DIP DRRIP Timekeeping AIP ETA Leeway DBCP EAF SDBP SHiP Hawkeye EVA
  83. 101 Taxonomy of Cache Replacement Policies Building a Highly Concurrent

    Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019. LRU EELRU SegLRU LIP SRRIP LFU FBR 2Q ARC LRFU DIP DRRIP Timekeeping AIP ETA Leeway DBCP EAF SDBP SHiP Hawkeye EVA
  84. 102 Taxonomy of Cache Replacement Policies Building a Highly Concurrent

    Cache in Go Coarse-Grained Policies Fine-Grained Policies Recency Frequency Hybrid Economic Value Reuse Distance Classification [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019. LRU EELRU SegLRU LIP SRRIP LFU FBR 2Q ARC LRFU DIP DRRIP Timekeeping AIP ETA Leeway DBCP EAF SDBP SHiP Hawkeye EVA
  85. 104 Building a Highly Concurrent Cache in Go LRFU (Least

    Recently/Least Frequently) A paper [2] published in 2001 suggests to combine LRU and LFU named LRFU. • Similar to LFU: each item holds a value • CRF: Combined Recency and Frequency • A parameter λ determines how much weight is given to recent entries type cache struct { size int mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string index int frequency int } [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.
  86. A paper [2] published in 2001 suggests to combine LRU

    and LFU named LRFU. • Similar to LFU: each item holds a value • CRF: Combined Recency and Frequency • A parameter λ determines how much weight is given to recent entries λ = 1.0 (LRU) λ = 0.0 (LFU) 105 Building a Highly Concurrent Cache in Go LRFU (Least Recently/Least Frequently) type cache struct { size int weight float64 mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string index int crf float64 } [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.
  87. A paper [2] published in 2001 suggests to combine LRU

    and LFU named LRFU. • Similar to LFU: each item holds a value • CRF: Combined Recency and Frequency • A parameter λ determines how much weight is given to recent entries λ = 1.0 (LRU) λ = 0.0 (LFU) λ = 0.001 106 Building a Highly Concurrent Cache in Go LRFU (Least Recently/Least Frequently) type cache struct { size int weight float64 mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string index int crf float64 } [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.
  88. A paper [2] published in 2001 suggests to combine LRU

    and LFU named LRFU. • Similar to LFU: each item holds a value • CRF: Combined Recency and Frequency • A parameter λ determines how much weight is given to recent entries λ = 1.0 (LRU) λ = 0.0 (LFU) λ = 0.001 (LFU with a pinch of LRU) 107 Building a Highly Concurrent Cache in Go LRFU (Least Recently/Least Frequently) type cache struct { size int weight float64 mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string index int crf float64 } [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.
  89. A paper [2] published in 2001 suggests to combine LRU

    and LFU named LRFU. • Calculate CRF for every entry whenever they need to be compared • math.Pow not a cheap operation • 0.5λx prone to floating-point overflow • New items likely to be evicted starting with CRF = 1.0 108 Building a Highly Concurrent Cache in Go LRFU (Least Recently/Least Frequently) type cache struct { size int weight float64 mu sync.Mutex data map[string]*item heap *MinHeap } type item struct { key string value string index int crf float64 } [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.
  90. 110 Building a Highly Concurrent Cache in Go DLFU (Decaying

    LFU Cache Expiry) Donovan Baarda, a Google SRE from Australia, came up with an improved algorithm [3] and a Python reference implementation [4], realizing: 1. LRFU decay is a simple exponential decay 2. Exponential decay can be approximated which eliminates math.Pow 3. Exponentially grow the reference increment instead of exponentially decaying all entries, thus requiring fewer fields per entry and fewer comparisons [3] https://github.com/dbaarda/DLFUCache [4] https://minkirri.apana.org.au/wiki/DecayingLFUCacheExpiry
  91. 111 Building a Highly Concurrent Cache in Go func NewDLFUCache[V

    any](ctx context.Context, config config.Config) *DLFUCache[V] { cache := &DLFUCache[V]{ data: make(map[string]*Item[V], len(config.Size)), heap: &MinHeap[V]{}, weight: config.Weight, size: config.Size, incr: 1.0, } if config.Weight == 0.0 { // there is no decay for LFU policy cache.decay = 1 } p := float64(config.Size) * config.Weight cache.decay = (p + 1.0) / p return cache } DLFU Cache
  92. 112 Building a Highly Concurrent Cache in Go func (c

    *lfuCache) Set(items map[string]string) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } for len(c.data) > c.size { item := heap.Pop(&c.heap).(*Item) delete(c.data, item.key) } } DLFU Cache
  93. 113 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } c.trim() } DLFU Cache
  94. 114 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, frequency: 1, } c.data[key] = item heap.Push(&c.heap, item) } c.trim() } DLFU Cache
  95. 115 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) { c.mu.Lock() defer c.mu.Unlock() for key, value := range items { item := &item{ key: key, value: value, score: c.incr, } c.data[key] = item heap.Push(&c.heap, item) } c.trim() } DLFU Cache
  96. 116 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) { c.mu.Lock() defer c.mu.Unlock() expiresAt := time.Now().Add(expiry) for key, value := range items { if item, ok := c.data[key]; ok { item.value = value item.expiresAt = now.Add(expiry) continue } item := &item{ key: key, value: value, score: c.incr, expiresAt: expiresAt, } c.data[key] = item heap.Push(&c.heap, item) } c.trim() } DLFU Cache
  97. 117 Building a Highly Concurrent Cache in Go func (c

    *lfuCache) Get(keys []string) (map[string]string, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok { result[key] = value frequency := item.frequency+1 c.heap.update(item, freqency) } else { missing = append(missing, key) } } return result, missing } DLFU Cache
  98. 118 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok { result[key] = value frequency := item.frequency+1 c.heap.update(item, freqency) } else { missing = append(missing, key) } } return result, missing } DLFU Cache
  99. 119 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value frequency := item.frequency+1 c.heap.update(item, freqency) } else { missing = append(missing, key) } } return result, missing } DLFU Cache
  100. 120 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } } return result, missing } DLFU Cache
  101. 121 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing } DLFU Cache
  102. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col

    /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LRU │ DLFU │ │ hit/op │ hit/op vs base │ Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10) │ LRU │ DLFU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10) │ LRU │ DLFU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)
  103. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col

    /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LRU │ DLFU │ │ hit/op │ hit/op vs base │ Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10) │ LRU │ DLFU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10) │ LRU │ DLFU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)
  104. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x -cpuprofile=cpu.out > bench benchstat

    -col /policy bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ LRU │ DLFU │ │ hit/op │ hit/op vs base │ Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10) │ LRU │ DLFU │ │ read-sec/op │ read-sec/op vs base │ Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10) │ LRU │ DLFU │ │ write-sec/op │ write-sec/op vs base │ Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)
  105. 126 Building a Highly Concurrent Cache in Go Profiling go

    tool pprof cpu.out File: cache.test Type: cpu Time: Sep 24, 2023 at 3:04pm (PDT) Duration: 850.60s, Total samples = 1092.33s (128.42%) Entering interactive mode (type "help" for commands, "o" for options) (pprof)
  106. 127 Building a Highly Concurrent Cache in Go Profiling go

    tool pprof cpu.out File: cache.test Type: cpu Time: Sep 24, 2023 at 3:04pm (PDT) Duration: 850.60s, Total samples = 1092.33s (128.42%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 567.44s, 51.95% of 1092.33s total Dropped 470 nodes (cum <= 5.46s) Showing top 10 nodes out of 104 flat flat% sum% cum cum% 118.69s 10.87% 10.87% 169.36s 15.50% runtime.findObject 88.04s 8.06% 18.93% 88.04s 8.06% github.com/konradreiche/cache/dlfu/v1.MinHeap[go.shape.string].Less 72.38s 6.63% 25.55% 319.24s 29.23% runtime.scanobject 60.74s 5.56% 31.11% 106.72s 9.77% runtime.mapaccess2_faststr 45.03s 4.12% 35.23% 126.25s 11.56% runtime.mapassign_faststr 40.60s 3.72% 38.95% 40.60s 3.72% time.Now 40.20s 3.68% 42.63% 41.13s 3.77% container/list.(*List).move 35.47s 3.25% 45.88% 35.47s 3.25% memeqbody 34.25s 3.14% 49.01% 34.25s 3.14% runtime.memclrNoHeapPointers 32.04s 2.93% 51.95% 44.34s 4.06% runtime.mapdelete_faststr
  107. 128 Building a Highly Concurrent Cache in Go Profiling go

    tool pprof cpu.out File: cache.test Type: cpu Time: Sep 24, 2023 at 3:04pm (PDT) Duration: 850.60s, Total samples = 1092.33s (128.42%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list DLFU.*Get
  108. 129 Building a Highly Concurrent Cache in Go Total: 1092.33s

    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get 6.96s 218.27s (flat, cum) 19.98% of Total . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) { . . 75: var missingKeys []string . 490ms 76: result := make(map[string]V) . . 77: 40ms 2.56s 78: c.mu.Lock() 30ms 30ms 79: defer c.mu.Unlock() . . 80: 1.04s 1.04s 81: for _, key := range keys { 1.34s 43.77s 82: item, ok := c.data[key] 290ms 53.66s 83: if ok && !item.expired() { 1.68s 44.35s 84: result[key] = item.value 1.72s 1.72s 85: item.score += c.incr 130ms 65.82s 86: c.heap.update(item, item.score) . . 87: } else { 530ms 3.55s 88: missingKeys = append(missingKeys, key) . . 89: } . . 90: } 20ms 20ms 91: c.incr *= c.decay 140ms 1.26s 92: return result, missingKeys . . 93:} . . 94:
  109. 130 Building a Highly Concurrent Cache in Go Total: 1092.33s

    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get 6.96s 218.27s (flat, cum) 19.98% of Total . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) { . . 75: var missingKeys []string . 490ms 76: result := make(map[string]V) . . 77: 40ms 2.56s 78: c.mu.Lock() 30ms 30ms 79: defer c.mu.Unlock() . . 80: 1.04s 1.04s 81: for _, key := range keys { 1.34s 43.77s 82: item, ok := c.data[key] 290ms 53.66s 83: if ok && !item.expired() { 1.68s 44.35s 84: result[key] = item.value 1.72s 1.72s 85: item.score += c.incr 130ms 65.82s 86: c.heap.update(item, item.score) . . 87: } else { 530ms 3.55s 88: missingKeys = append(missingKeys, key) . . 89: } . . 90: } 20ms 20ms 91: c.incr *= c.decay 140ms 1.26s 92: return result, missingKeys . . 93:} . . 94: Maintaining a heap is more expensive than LRU, which only requires a doubly linked list.
  110. 131 Building a Highly Concurrent Cache in Go Total: 1092.33s

    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get 6.96s 218.27s (flat, cum) 19.98% of Total . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) { . . 75: var missingKeys []string . 490ms 76: result := make(map[string]V) . . 77: 40ms 2.56s 78: c.mu.Lock() 30ms 30ms 79: defer c.mu.Unlock() . . 80: 1.04s 1.04s 81: for _, key := range keys { 1.34s 43.77s 82: item, ok := c.data[key] 290ms 53.66s 83: if ok && !item.expired() { 1.68s 44.35s 84: result[key] = item.value 1.72s 1.72s 85: item.score += c.incr 130ms 65.82s 86: c.heap.update(item, item.score) . . 87: } else { 530ms 3.55s 88: missingKeys = append(missingKeys, key) . . 89: } . . 90: } 20ms 20ms 91: c.incr *= c.decay 140ms 1.26s 92: return result, missingKeys . . 93:} . . 94: The CPU profile does not capture the time spent waiting to acquire a lock.
  111. go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out File: cache.test Type: delay

    Time: Sep 24, 2023 at 3:48pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof)
  112. go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out File: cache.test Type: delay

    Time: Sep 24, 2023 at 3:48pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list DLFU.*Get Total: 615.48s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get 0 297.55s (flat, cum) 48.34% of Total . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) { . . 75: var missingKeys []string . . 76: result := make(map[string]V) . . 77: . 297.55s 78: c.mu.Lock() . . 79: defer c.mu.Unlock() . . 80: . . 81: for _, key := range keys { . . 82: item, ok := c.data[key] . . 83: if ok && !item.expired() {
  113. go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out File: cache.test Type: delay

    Time: Sep 24, 2023 at 3:48pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list DLFU.*Get Total: 615.48s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get 0 297.55s (flat, cum) 48.34% of Total . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) { . . 75: var missingKeys []string . . 76: result := make(map[string]V) . . 77: . 297.55s 78: c.mu.Lock() . . 79: defer c.mu.Unlock() . . 80: . . 81: for _, key := range keys { . . 82: item, ok := c.data[key] . . 83: if ok && !item.expired() {
  114. go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out File: cache.test Type: delay

    Time: Sep 24, 2023 at 3:48pm (PDT) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list DLFU.*Set Total: 615.48s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Set 0 193.89s (flat, cum) 31.50% of Total . . 99:func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) { . 193.89s 100: c.mu.Lock() . . 101: defer c.mu.Unlock() . . 102: . . 103: now := time.Now() . . 104: for key, value := range items { . . 105: if ctx.Err() != nil {
  115. 137 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing } Critical Section Critical Section
  116. 138 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) c.mu.Lock() defer c.mu.Unlock() for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing }
  117. 139 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing }
  118. 140 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) { result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { c.mu.Lock() if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } c.mu.Unlock() } c.mu.Lock() c.incr *= c.decay c.mu.Unlock() return result, missing }
  119. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col

    /dlfu bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ V1 │ V2 │ │ hit/op │ hit/op vs base │ Cache-16 70.94% ± 0% 70.30% ± 0% -0.89% (p=0.000 n=10) │ V1 │ V2 │ │ read-sec/op │ read-sec/op vs base │ Cache-16 4.34ms ± 54% 1.43ms ± 26% -67.13% (p=0.001 n=10) │ V1 │ V2 │ │ write-sec/op │ write-sec/op vs base │ Cache-16 2.43ms ± 62% 574.3µs ± 25% -76.36% (p=0.000 n=10)
  120. • Spike in number of goroutines, memory usage & timeouts

    • Latency added to call paths integrating the local in-memory cache In Production
  121. • Spike in number of goroutines, memory usage & timeouts

    • Latency added to call paths integrating the local in-memory cache • For incremental progress: ◦ Feature Flags with Sampling ◦ Timeout for Cache Operations In Production
  122. if !liveconfig.Sample("cache.read_rate") { return } go func() { cached, missingIDs

    = localCache.Get(keys) }() In Production: Timeout for Cache Operations
  123. if !liveconfig.Sample("cache.read_rate") { return } // perform cache-lookup in goroutine

    to avoid blocking for too long ctx, cancel := context.WithTimeout(context.Background(), 5*time.Millisecond) go func() { cached, missingIDs = localCache.Get(keys) cancel() }() <-ctx.Done() // timeout: return all keys as missing and let remote cache handle it if ctxCache.Err() == context.DeadlineExceeded { return map[string]T{}, keys } In Production: Timeout for Cache Operations
  124. if !liveconfig.Sample("cache.read_rate") { return } // perform cache-lookup in goroutine

    to avoid blocking for too long ctx, cancel := context.WithTimeout(context.Background(), 5*time.Millisecond) go func() { cached, missingIDs = localCache.Get(keys) cancel() }() <-ctx.Done() // timeout: return all keys as missing and let remote cache handle it if ctxCache.Err() == context.DeadlineExceeded { return map[string]T{}, keys } In Production: Timeout for Cache Operations Pass context into the cache operations too.
  125. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {

    result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { c.mu.Lock() if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } c.mu.Unlock() } c.mu.Lock() c.incr *= c.decay c.mu.Unlock() return result, missing } In Production: Timeout for Cache Operations
  126. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {

    result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { c.mu.Lock() if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } c.mu.Unlock() } c.mu.Lock() c.incr *= c.decay c.mu.Unlock() return result, missing } In Production: Timeout for Cache Operations
  127. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {

    result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { c.mu.Lock() if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } c.mu.Unlock() } c.mu.Lock() c.incr *= c.decay c.mu.Unlock() return result, missing } In Production: Timeout for Cache Operations The goroutine gets abandoned after the timeout. Checking for context cancellation to stop iteration helps to reduce lock contention.
  128. 157 sync.Map • I wrongly assumed sync.Map is an untyped

    map proteced by a sync.RWMutex. • I recommend to frequently dive into the standard library source code, for example for sync.Map we can see this implementation is much more intricate: Building a Highly Concurrent Cache in Go func (m *Map) Load(key any) (value any, ok bool) { read := m.loadReadOnly() e, ok := read.m[key] if !ok && read.amended { m.mu.Lock() // Avoid reporting a spurious miss if m.dirty got promoted while we were // blocked on m.mu. (If further loads of the same key will not miss, it's // not worth copying the dirty map for this key.) read = m.loadReadOnly() e, ok = read.m[key]
  129. 158 sync.Map Map is like a Go map[interface{}]interface{} but is

    safe for concurrent use by multiple goroutines without additional locking or coordination. Loads, stores, and deletes run in amortized constant time. The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content. The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex. Building a Highly Concurrent Cache in Go
  130. 159 sync.Map Map is like a Go map[interface{}]interface{} but is

    safe for concurrent use by multiple goroutines without additional locking or coordination. Loads, stores, and deletes run in amortized constant time. The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content. The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex. Building a Highly Concurrent Cache in Go
  131. 160 sync.Map Map is like a Go map[interface{}]interface{} but is

    safe for concurrent use by multiple goroutines without additional locking or coordination. Loads, stores, and deletes run in amortized constant time. The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination, for better type safety and to make it easier to maintain other invariants along with the map content. The Map type is optimized for two common use cases: (1) when the entry for a given key is only ever written once but read many times, as in caches that only grow, or (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In these two cases, use of a Map may significantly reduce lock contention compared to a Go map paired with a separate Mutex or RWMutex. Building a Highly Concurrent Cache in Go
  132. 161 xsync.Map • Third-party library providing concurrent data structures for

    Go https://github.com/puzpuzpuz/xsync • xsync.Map is a concurrent hash table based map using a modified version of Cache-Line Hash Table (CLHT) data structure. • CLHT organizes the hash table in cache-line-sized buckets to reduce the number of cache-line transfers. Building a Highly Concurrent Cache in Go
  133. 162 xsync.Map • Third-party library providing concurrent data structures for

    Go https://github.com/puzpuzpuz/xsync • xsync.Map is a concurrent hash table based map using a modified version of Cache-Line Hash Table (CLHT) data structure. • CLHT organizes the hash table in cache-line-sized buckets to reduce the number of cache-line transfers. Building a Highly Concurrent Cache in Go
  134. 163 Symmetric Multiprocessing (SMP) CPU Core L1 Core L1 Core

    L1 Core L1 L3 Main Memory (RAM) L2 L2 L2 L2 Building a Highly Concurrent Cache in Go
  135. 164 Locality of Reference Building a Highly Concurrent Cache in

    Go • Temporal Locality: a processor accessing a particular memory location will likely access it again in the near future. • Spatial Locality: a processing accessing a particular memory location will access memory locations nearby. • Not one memory location is copied to the CPU cache, but a cache line. • Cache Line (Cache Block): adjacent chunk of memory.
  136. 165 False Sharing CPU Core 1 L1 CPU Core 2

    L1 Main Memory var x var y Building a Highly Concurrent Cache in Go
  137. 166 False Sharing CPU Core 1 L1 CPU Core 2

    L1 Main Memory var x var y Building a Highly Concurrent Cache in Go Read variable x into cache
  138. 167 False Sharing Main Memory CPU Core 1 L1 CPU

    Core 2 L1 var x var y var x var y Building a Highly Concurrent Cache in Go E
  139. 168 False Sharing Main Memory CPU Core 2 L1 var

    x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var x var y E Read variable y into cache
  140. 169 False Sharing Building a Highly Concurrent Cache in Go

    CPU Core 2 L1 Main Memory var x var y CPU Core 1 L1 var y var x S var x var y S
  141. 170 False Sharing Main Memory CPU Core 2 L1 var

    x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var x var y M var y var x S Write to variable x
  142. 171 False Sharing Main Memory CPU Core 2 L1 var

    x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var x var y M var y var x S Invalidate cache line
  143. 172 False Sharing Building a Highly Concurrent Cache in Go

    Main Memory var x var y CPU Core 1 L1 var x var y M CPU Core 2 L1
  144. 173 False Sharing Main Memory var x var y Building

    a Highly Concurrent Cache in Go CPU Core 1 L1 var x var y M CPU Core 2 L1 Write results in coherence miss Invalidate cache line
  145. 174 False Sharing Building a Highly Concurrent Cache in Go

    CPU Core 1 L1 Main Memory var x var y CPU Core 2 L1 var y var x M Coherence Write-Back
  146. 175 False Sharing Building a Highly Concurrent Cache in Go

    CPU Core 1 L1 Main Memory var x var y CPU Core 2 L1 var y var x M
  147. 176 False Sharing Building a Highly Concurrent Cache in Go

    Read results in coherence miss CPU Core 1 L1 Main Memory var x var y CPU Core 2 L1 var y var x M Invalidate cache line
  148. 177 False Sharing CPU Core 2 L1 Main Memory var

    x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var y var x M Coherence Write-Back
  149. 178 False Sharing CPU Core 2 L1 Main Memory var

    x var y Building a Highly Concurrent Cache in Go CPU Core 1 L1 var y var x S var x var y S
  150. 179 Cache Coherence Protocol Building a Highly Concurrent Cache in

    Go • Ensure CPU cores have a consistent view of the same data. • Added coordination between CPU cores impacts application performance. • Reducing the need for cache coherence will make for faster Go applications.
  151. 180 xsync.Map • Third-party library providing concurrent data structures for

    Go https://github.com/puzpuzpuz/xsync • xsync.Map is a concurrent hash table based map using a modified version of Cache-Line Hash Table (CLHT) data structure. • CLHT organizes the hash table in cache-line-sized buckets to reduce the number of cache-line transfers. Building a Highly Concurrent Cache in Go
  152. 181 DLFU Cache: Removing Locking • Maintaining the heap still

    requires a mutex. • To fully leverage xsync.Map we would want to eliminate the mutex. • Perform cache eviction in a goroutine: collect all entries and sort them. • Replace synchronized access to numeric or string types in Go with atomic operations. • It’s not really lock-free: move locking closer to the CPU. Building a Highly Concurrent Cache in Go
  153. 182 Building a Highly Concurrent Cache in Go func (c

    *DLFUCache[V]) trimmer(ctx context.Context) { for { select { case <-ctx.Done(): return case <-time.After(250 * time.Millisecond): if ctx.Err() != nil { return } c.trim() } } } Perform Cache Eviction Asynchronously
  154. func (c *DLFUCache[V]) trim() { size := c.data.Size() if size

    <= c.size { return } items := make(items[V], 0, size) c.data.Range(func(key string, value *item[V]) bool { items = append(items, value) return true }) sort.Sort(items) for i := 0; i < len(items)-c.size; i++ { c.data.Delete(items[i].key.Load()) } } 183 Building a Highly Concurrent Cache in Go Perform Cache Eviction Asynchronously
  155. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {

    result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { if ctx.Err() != nil { return result, append(keys[i:], missingKeys...) } c.mu.Lock() if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr c.heap.update(item, item.score) } else { missing = append(missing, key) } c.mu.Unlock() } c.mu.Lock() c.incr *= c.decay l.mu.Unlock() return result, missing } Integrate xsync.Map
  156. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {

    result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { if ctx.Err() != nil { return result, append(keys[i:], missingKeys...) } if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing } Integrate xsync.Map
  157. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {

    result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { if ctx.Err() != nil { return result, append(keys[i:], missingKeys...) } if item, ok := c.data[key]; ok && !item.expired() { result[key] = value item.score += c.incr } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing } Integrate xsync.Map
  158. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {

    result := make(map[string]string) missing := make([]string, 0) for _, key := range keys { if ctx.Err() != nil { return result, append(keys[i:], missingKeys...) } if item, ok := c.data.Load(key); ok && !item.expired() { result[key] = value item.score += c.incr } else { missing = append(missing, key) } } c.incr *= c.decay return result, missing } Integrate xsync.Map
  159. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {

    result := make(map[string]string) missing := make([]string, 0) incr := c.incr.Load() for _, key := range keys { if ctx.Err() != nil { return result, append(keys[i:], missingKeys...) } if item, ok := c.data.Load(key); ok && !item.expired() { result[key] = value item.score.Add(incr) } else { missing = append(missing, key) } } c.incr.Store(incr * c.decay) return result, missing } Integrate xsync.Map
  160. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col

    /dlfu bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ V2 │ V3 │ │ hit/op │ hit/op vs base │ Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10) │ V2 │ V3 │ │ read-sec/op │ read-sec/op vs base │ Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10) │ V2 │ V3 │ │ trim-sec/op │ trim-sec/op vs base │ Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10) │ V2 │ V3 │ │ write-sec/op │ write-sec/op vs base │ Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)
  161. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col

    /dlfu bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ V2 │ V3 │ │ hit/op │ hit/op vs base │ Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10) │ V2 │ V3 │ │ read-sec/op │ read-sec/op vs base │ Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10) │ V2 │ V3 │ │ trim-sec/op │ trim-sec/op vs base │ Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10) │ V2 │ V3 │ │ write-sec/op │ write-sec/op vs base │ Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)
  162. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col

    /dlfu bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ V2 │ V3 │ │ hit/op │ hit/op vs base │ Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10) │ V2 │ V3 │ │ read-sec/op │ read-sec/op vs base │ Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10) │ V2 │ V3 │ │ trim-sec/op │ trim-sec/op vs base │ Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10) │ V2 │ V3 │ │ write-sec/op │ write-sec/op vs base │ Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)
  163. go tool pprof cpu.out File: cache.test Type: cpu Time: Sep

    26, 2023 at 1:16pm (PDT) Duration: 42.14s, Total samples = 81.68s (193.83%) Entering interactive mode (type "help" for commands, "o" for options) (pprof)
  164. go tool pprof cpu.out File: cache.test Type: cpu Time: Sep

    26, 2023 at 1:16pm (PDT) Duration: 42.14s, Total samples = 81.68s (193.83%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list trim Total: 81.68s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v3.(*DLFUCache[go.shape.string]).trim 20ms 37.23s (flat, cum) 45.58% of Total . . 197:func (c *DLFUCache[V]) Trim() { . 20ms 198: size := c.data.Size() . 10ms 199: if c.data.Size() <= c.size { . . 200: return . . 201: } . . 202: . 80ms 203: items := make(*items[V], 0, size) . 6.82s 204: c.data.Range(func(key string, value *item[V]) bool { . . 205: items = append(items, value) . . 206: return true . . 207: }) . 26.98s 208: sort.Sort(items) . . 209: 10ms 10ms 210: for i := 0; i < len(items)-c.size; i++ { 10ms 680ms 211: key := items[i].key.Load() . 2.63s 212: c.data.Delete(key) . . 213: } . . 214:}
  165. go tool pprof cpu.out File: cache.test Type: cpu Time: Sep

    26, 2023 at 1:16pm (PDT) Duration: 42.14s, Total samples = 81.68s (193.83%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) list trim Total: 81.68s ROUTINE ======================== github.com/konradreiche/cache/dlfu/v3.(*DLFUCache[go.shape.string]).trim 20ms 37.23s (flat, cum) 45.58% of Total . . 197:func (c *DLFUCache[V]) Trim() { . 20ms 198: size := c.data.Size() . 10ms 199: if c.data.Size() <= c.size { . . 200: return . . 201: } . . 202: . 80ms 203: items := make(*items[V], 0, size) . 6.82s 204: c.data.Range(func(key string, value *item[V]) bool { . . 205: items = append(items, value) . . 206: return true . . 207: }) . 26.98s 208: sort.Sort(items) . . 209: 10ms 10ms 210: for i := 0; i < len(items)-c.size; i++ { 10ms 680ms 211: key := items[i].key.Load() . 2.63s 212: c.data.Delete(key) . . 213: } . . 214:}
  166. 196 Faster Eviction Building a Highly Concurrent Cache in Go

    • sort.Sort uses pattern-defeating quicksort (pdqsort) • On the Gophers Slack in #performance Aurélien Rainone suggested to use quickselect. • Quickselect is a linear algorithm to find the k-th smallest elements.
  167. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench benchstat -col

    /dlfu bench goos: linux goarch: amd64 pkg: github.com/konradreiche/cache cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz │ V4 │ V5 │ │ hit/op │ hit/op vs base │ Cache-16 76.44% ± 1% 74.84% ± 1% -2.09% (p=0.001 n=10) │ V4 │ V5 │ │ read-sec/op │ read-sec/op vs base │ Cache-16 477.5µ ± 40% 358.4µ ± 44% ~ (p=0.529 n=10) │ V4 │ V5 │ │ trim-sec/op │ trim-sec/op vs base │ Cache-16 463.3m ± 54% 129.1m ± 85% -72.14% (p=0.002 n=10) │ V4 │ V5 │ │ write-sec/op │ write-sec/op vs base │ Cache-16 193.2µ ± 53% 133.6µ ± 40% ~ (p=0.280 n=10)
  168. 198 Summary • Implementing your own cache in Go makes

    it possible to optimize by leveraging properties that are unique to your use case. • Different cache replacement policies: LRU, LFU, DLFU, etc. • DLFU (Decaying Least Frequently Used): like LFU but with exponential decay on the cache entry’s reference count. • How to write benchmarks and utilize parallel execution for concurrency. • Using Go’s profiler to optimize for concurrency contention. Building a Highly Concurrent Cache in Go
  169. 199 Summary • Cache coherency protocol can impact concurrent performance

    in Go applications • There is no such thing as lock-free when multiple processors are involved. • Performance can be improved with lock-free data structures and atomic primitives, but your mileage will differ. Building a Highly Concurrent Cache in Go
  170. “ “ Don’t generalize from the talk’s example. Write your

    own code, construct your own benchmarks. You will be surprised. 200 Building a Highly Concurrent Cache in Go