Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Highly Concurrent Cache in Go: A Hitchhiker's Guide

Konrad Reiche
September 27, 2023

Building a Highly Concurrent Cache in Go: A Hitchhiker's Guide

Why doesn't the Go standard library provide a concurrent cache? Because Go emphasizes building custom data structures that fit your needs. In this talk, you will learn how to design, implement, and optimize a concurrent cache in Go, combing LRU & LFU eviction policy, and advanced concurrency patterns beyond sync.Mutex.

GopherCon 2023, San Diego

Konrad Reiche

September 27, 2023
Tweet

More Decks by Konrad Reiche

Other Decks in Programming

Transcript

  1. Building a Highly
    Concurrent Cache in Go
    A Hitchhiker’s Guide
    Konrad Reiche

    View full-size slide



  2. In its full generality, multithreading is
    an incredibly complex and error-prone
    technique, not to be recommended in
    any but the smallest programs.
    ― C. A. R. Hoare:
    Communicating Sequential Processes

    View full-size slide

  3. 3 Building a Highly Concurrent Cache in Go
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  4. 4 Building a Highly Concurrent Cache in Go
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  5. 5 Building a Highly Concurrent Cache in Go
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  6. 6 Building a Highly Concurrent Cache in Go
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  7. 7 Building a Highly Concurrent Cache in Go
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  8. 8 Building a Highly Concurrent Cache in Go
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  9. 9 Building a Highly Concurrent Cache in Go
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  10. 10 Building a Highly Concurrent Cache in Go
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  11. 11 Building a Highly Concurrent Cache in Go
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  12. Building a Highly Concurrent Cache in Go
    12
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  13. Building a Highly Concurrent Cache in Go
    13
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  14. Building a Highly Concurrent Cache in Go
    14
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  15. Building a Highly Concurrent Cache in Go
    15
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  16. Building a Highly Concurrent Cache in Go
    16
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  17. Building a Highly Concurrent Cache in Go
    17
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  18. Building a Highly Concurrent Cache in Go
    18
    Adapted from images in Rob Pike’s talk “Concurrency is not Parallelism” by Renée French, used under CC 4.0 BY

    View full-size slide

  19. [7932676.760082] Out of memory: kill process 23936
    [7932676.761528] Killed process 23936 (go)

    View full-size slide



  20. Don’t panic.
    20 Building a Highly Concurrent Cache in Go

    View full-size slide



  21. Don’t panic.
    21 Building a Highly Concurrent Cache in Go
    ― Effective Go

    View full-size slide



  22. Don’t panic.
    22 Building a Highly Concurrent Cache in Go
    ― Effective Go
    ― Douglas Adams:
    The Hitchhiker's
    Guide to the Galaxy

    View full-size slide

  23. 24
    What is a Cache?
    Software component that stores data so that future
    requests for the data can be served faster.
    Building a Highly Concurrent Cache in Go

    View full-size slide

  24. 25
    What is a Cache?
    Software component that stores data so that future
    requests for the data can be served faster.
    Remember the result of an expensive operation,
    to speed up reads.
    Building a Highly Concurrent Cache in Go

    View full-size slide

  25. 26
    What is a Cache?
    Software component that stores data so that future
    requests for the data can be served faster.
    Remember the result of an expensive operation,
    to speed up reads.
    Building a Highly Concurrent Cache in Go
    data, ok := cache.Get()
    if !ok {
    data := doSomething()
    cache.Set(data)
    }

    View full-size slide

  26. 27
    Me
    ca. 2019

    View full-size slide

  27. 28
    My Manager
    ca. 2019

    View full-size slide

  28. 29
    My Manager
    and Me
    Where is the cache Lebowski Reiche?

    View full-size slide

  29. 30
    Caching Post Data for Ranking
    Building a Highly Concurrent Cache in Go
    Kubernetes Pods for
    Ranking Service

    View full-size slide

  30. 31 Building a Highly Concurrent Cache in Go
    Kubernetes Pods for
    Ranking Service
    Thing Service
    Get data for
    filtering posts
    Caching Post Data for Ranking

    View full-size slide

  31. 32 Building a Highly Concurrent Cache in Go
    Kubernetes Pods for
    Ranking Service
    Thing Service
    Get data for
    filtering posts
    Redis Cache
    Cluster
    Caching Post Data for Ranking

    View full-size slide

  32. 33 Building a Highly Concurrent Cache in Go
    Kubernetes Pods for
    Ranking Service
    Thing Service
    Get data for
    filtering posts
    Redis Cache
    Cluster
    Cache Lookups
    Caching Post Data for Ranking
    Update on Miss

    View full-size slide

  33. 34 Building a Highly Concurrent Cache in Go
    Kubernetes Pods for
    Ranking Service
    Thing Service
    Get data for
    filtering posts
    Redis Cache
    Cluster
    Caching Post Data for Ranking
    Cache Lookups
    5-20m op/sec
    Update on Miss

    View full-size slide

  34. 35 Building a Highly Concurrent Cache in Go
    Kubernetes Pods for
    Ranking Service
    Thing Service
    Get data for
    filtering posts
    Redis Cache
    Cluster
    Caching Post Data for Ranking
    Local In-Memory Cache with
    ~90k op/sec per pod
    Cache Lookups
    5-20m op/sec
    Update on Miss

    View full-size slide

  35. 36
    type cache struct {
    data map[string]string
    }
    Building a Highly Concurrent Cache in Go

    View full-size slide

  36. 37
    type cache[K comparable, V any] struct {
    data map[K]V
    }
    Building a Highly Concurrent Cache in Go

    View full-size slide

  37. 38
    type cache struct {
    data map[string]string
    }
    Building a Highly Concurrent Cache in Go
    We can introduce generics once
    needed to keep it simple.

    View full-size slide

  38. 39
    type cache struct {
    data map[string]string
    }
    func (c *cache) Set(key, value string) {
    c.data[key] = value
    }
    func (c *cache) Get(key string) (string, bool) {
    value, ok := c.data[key]
    return value, ok
    }
    Building a Highly Concurrent Cache in Go

    View full-size slide

  39. 40
    type cache struct {
    mu sync.Mutex
    data map[string]string
    }
    func (c *cache) Set(key, value string) {
    c.data[key] = value
    }
    func (c *cache) Get(key string) (string, bool) {
    value, ok := c.data[key]
    return value, ok
    }
    Building a Highly Concurrent Cache in Go

    View full-size slide

  40. 41
    type cache struct {
    mu sync.Mutex
    data map[string]string
    }
    func (c *cache) Set(key, value string) {
    c.data[key] = value
    }
    func (c *cache) Get(key string) (string, bool) {
    value, ok := c.data[key]
    return value, ok
    }
    Building a Highly Concurrent Cache in Go

    View full-size slide

  41. 42
    type cache struct {
    mu sync.Mutex
    data map[string]string
    }
    func (c *cache) Set(key, value string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.data[key] = value
    }
    func (c *cache) Get(key string) (string, bool) {
    c.mu.Lock()
    defer c.mu.Unlock()
    value, ok := c.data[key]
    return value, ok
    }
    Building a Highly Concurrent Cache in Go

    View full-size slide

  42. 43
    type cache struct {
    mu sync.RWMutex
    data map[string]string
    }
    func (c *cache) Set(key, value string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.data[key] = value
    }
    func (c *cache) Get(key string) (string, bool) {
    c.mu.RLock()
    defer c.mu.RUnlock()
    value, ok := c.data[key]
    return value, ok
    }
    Building a Highly Concurrent Cache in Go

    View full-size slide

  43. 44
    type cache struct {
    mu sync.RWMutex
    data map[string]string
    }
    func (c *cache) Set(key, value string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    c.data[key] = value
    }
    func (c *cache) Get(key string) (string, bool) {
    c.mu.RLock()
    defer c.mu.RUnlock()
    value, ok := c.data[key]
    return value, ok
    }
    Building a Highly Concurrent Cache in Go
    Don’t generalize the cache from the
    start; pick an API that maximizes
    your usage pattern.

    View full-size slide

  44. 45
    type cache struct {
    mu sync.RWMutex
    data map[string]string
    }
    func (c *cache) Set(items map[string]string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    c.data[key] = value
    }
    }
    func (c *cache) Get(keys []string) map[string]string {
    result := make(map[string]string)
    c.mu.RLock()
    defer c.mu.RUnlock()
    for _, key := range keys {
    if value, ok := c.data[key]; ok {
    result[key] = value
    }
    }
    return result
    }
    Building a Highly Concurrent Cache in Go

    View full-size slide

  45. [7932676.760082] Out of memory: kill process 23936
    [7932676.761528] Killed process 23936 (go)

    View full-size slide

  46. Cache
    Replacement
    Policies

    View full-size slide

  47. 48
    Once the cache exceeds its maximum capacity,
    which data should be evicted to make space for
    new data?
    Building a Highly Concurrent Cache in Go
    Cache Replacement Policies
    Cache
    Entry
    1
    Entry
    2
    Entry
    3

    Entry
    n

    View full-size slide

  48. 49
    Once the cache exceeds its maximum capacity,
    which data should be evicted to make space for
    new data?
    Bélády's Optimal Replacement Algorithm
    Remove the entry whose next use will occur
    farthest in the future.
    Building a Highly Concurrent Cache in Go
    Cache Replacement Policies
    Cache
    Entry
    1
    Entry
    2
    Entry
    3

    Entry
    n

    View full-size slide

  49. 50
    Once the cache exceeds its maximum capacity,
    which data should be evicted to make space for
    new data?
    Bélády's Optimal Replacement Algorithm
    Remove the entry whose next use will occur
    farthest in the future.
    Building a Highly Concurrent Cache in Go
    Cache Replacement Policies
    Because we cannot predict the future, we can
    only try to approximate this behavior.
    Cache
    Entry
    1
    Entry
    2
    Entry
    3

    Entry
    n

    View full-size slide

  50. 51
    Taxonomy of Cache Replacement Policies
    Building a Highly Concurrent Cache in Go
    Coarse-Grained
    Policies
    Fine-Grained
    Policies
    Recency Frequency Hybrid Economic Value Reuse Distance Classification
    Longer History
    More Access Patterns
    [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.

    View full-size slide

  51. 52
    Taxonomy of Cache Replacement Policies
    Building a Highly Concurrent Cache in Go
    Coarse-Grained
    Policies
    Fine-Grained
    Policies
    Recency Frequency Hybrid Economic Value Reuse Distance Classification
    Longer History
    More Access Patterns
    [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
    LRU LFU

    View full-size slide

  52. 53 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.

    View full-size slide

  53. 54 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    type cache struct {
    size int
    mu sync.RWMutex
    data map[string]string
    }

    View full-size slide

  54. 55 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    type cache struct {
    size int
    mu sync.Mutex
    data map[string]*item
    heap *MinHeap
    }
    type item struct {
    key string
    value string
    frequency int
    index int
    }

    View full-size slide

  55. 56 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    func (c *cache) Set(items map[string]string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    for len(c.data) > c.size {
    item := heap.Pop(&c.heap).(*Item)
    delete(c.data, item.key)
    }
    }

    View full-size slide

  56. 57 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    func (c *cache) Set(items map[string]string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    for len(c.data) > c.size {
    item := heap.Pop(&c.heap).(*Item)
    delete(c.data, item.key)
    }
    }
    a
    frequency: 1

    View full-size slide

  57. 58 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    func (c *cache) Set(items map[string]string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    for len(c.data) > c.size {
    item := heap.Pop(&c.heap).(*Item)
    delete(c.data, item.key)
    }
    }
    a
    frequency: 1
    b
    frequency: 1

    View full-size slide

  58. 59 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    a
    frequency: 1
    b
    frequency: 1
    c
    frequency: 1
    func (c *cache) Set(items map[string]string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    for len(c.data) > c.size {
    item := heap.Pop(&c.heap).(*Item)
    delete(c.data, item.key)
    }
    }

    View full-size slide

  59. 60 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    func (c *cache) Get(keys []string) (
    map[string]string,
    []string,
    ) {
    result := make(map[string]string)
    missing := make([]string, 0)
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, key := range keys {
    if item, ok := c.data[key]; ok {
    result[key] = item.value
    frequency := item.frequency+1
    c.heap.update(item, frequency)
    } else {
    missing = append(missing, key)
    }
    }
    return result, missing
    }
    a
    frequency: 1
    b
    frequency: 1
    c
    frequency: 1

    View full-size slide

  60. 61 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    func (c *cache) Get(keys []string) (
    map[string]string,
    []string,
    ) {
    result := make(map[string]string)
    missing := make([]string, 0)
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, key := range keys {
    if item, ok := c.data[key]; ok {
    result[key] = item.value
    frequency := item.frequency+1
    c.heap.update(item, frequency)
    } else {
    missing = append(missing, key)
    }
    }
    return result, missing
    }
    a
    frequency: 1
    b
    frequency: 1
    c
    frequency: 1
    cache.Get([]string{"a", "b"})

    View full-size slide

  61. 62 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    func (c *cache) Get(keys []string) (
    map[string]string,
    []string,
    ) {
    result := make(map[string]string)
    missing := make([]string, 0)
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, key := range keys {
    if item, ok := c.data[key]; ok {
    result[key] = item.value
    frequency := item.frequency+1
    c.heap.update(item, frequency)
    } else {
    missing = append(missing, key)
    }
    }
    return result, missing
    }
    c
    frequency: 1
    a
    frequency: 2
    b
    frequency: 2

    View full-size slide

  62. 63 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    func (c *cache) Set(items map[string]string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    for len(c.data) > c.size {
    item := heap.Pop(&c.heap).(*Item)
    delete(c.data, item.key)
    }
    }
    c
    frequency: 1
    a
    frequency: 2
    b
    frequency: 2
    cache.Set(map[string]string{
    "d": "⌯Go",
    })

    View full-size slide

  63. 64 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    c
    frequency: 1
    d
    frequency: 1
    b
    frequency: 2
    a
    frequency: 2
    func (c *cache) Set(items map[string]string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    for len(c.data) > c.size {
    item := heap.Pop(&c.heap).(*Item)
    delete(c.data, item.key)
    }
    }

    View full-size slide

  64. 65 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    c
    frequency: 1
    d
    frequency: 1
    b
    frequency: 2
    a
    frequency: 2
    func (c *cache) Set(items map[string]string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    for len(c.data) > c.size {
    item := heap.Pop(&c.heap).(*Item)
    delete(c.data, item.key)
    }
    }

    View full-size slide

  65. 66 Building a Highly Concurrent Cache in Go
    LFU
    Least Frequently Used (LFU)
    Favor entries that are used frequently.
    d
    frequency: 1
    a
    frequency: 2
    b
    frequency: 2
    func (c *cache) Set(items map[string]string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    for len(c.data) > c.size {
    item := heap.Pop(&c.heap).(*Item)
    delete(c.data, item.key)
    }
    }

    View full-size slide

  66. 67
    Taxonomy of Cache Replacement Policies
    Building a Highly Concurrent Cache in Go
    Coarse-Grained
    Policies
    Fine-Grained
    Policies
    Recency Frequency Hybrid Economic Value Reuse Distance Classification
    Longer History
    More Access Patterns
    [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
    LRU LFU

    View full-size slide

  67. 68
    Taxonomy of Cache Replacement Policies
    Building a Highly Concurrent Cache in Go
    Coarse-Grained
    Policies
    Fine-Grained
    Policies
    Recency Frequency Hybrid Economic Value Reuse Distance Classification
    LRU
    EELRU
    SegLRU
    LIP
    SRRIP
    LFU
    FBR
    2Q
    ARC
    LRFU
    DIP
    DRRIP
    EVA Timekeeping
    AIP
    ETA
    Leeway
    DBCP
    EAF
    SDBP
    SHiP
    Hawkeye
    [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.

    View full-size slide

  68. 69 Building a Highly Concurrent Cache in Go
    Kubernetes Pods for
    Ranking Service
    Thing Service
    Get data for
    filtering posts
    Redis Cache
    Cluster
    Caching Post Data for Ranking
    Local In-Memory Cache with
    ~90k op/sec per pod
    Cache Lookups
    5-20m op/sec
    Update on Miss

    View full-size slide

  69. Benchmarks
    Functions of the form
    func BenchmarkXxx(*testing.B)
    are considered benchmarks, and are executed by the go test command when its
    -bench flag is provided.
    B is a type passed to Benchmark functions to manage benchmark timing and to
    specify the number of iterations to run.

    View full-size slide

  70. Benchmarks
    ● Before improving the performance of code, we should measure its current
    performance
    ● Create a stable environment
    ○ Idle machine
    ○ No shared hardware
    ○ Don’t browse the web
    ○ Power saving, thermal scaling
    ● The testing package has built-in support for writing benchmarks

    View full-size slide

  71. Benchmarks
    func BenchmarkGet(b *testing.B) {
    for i := 0; i < b.N; i++ {
    }
    }

    View full-size slide

  72. Benchmarks
    func BenchmarkGet(b *testing.B) {
    cache := NewLFUCache(100)
    keys := make([]string, 100)
    items := make(map[string]string)
    for i := 0; i < 100; i++ {
    kv := fmt.Sprint(i)
    keys[i] = kv
    items[kv] = kv
    }
    for i := 0; i < b.N; i++ {
    }
    }

    View full-size slide

  73. Benchmarks
    func BenchmarkGet(b *testing.B) {
    cache := NewLFUCache(100)
    keys := make([]string, 100)
    items := make(map[string]string)
    for i := 0; i < 100; i++ {
    kv := fmt.Sprint(i)
    keys[i] = kv
    items[kv] = kv
    }
    cache.Set(items)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
    }
    }

    View full-size slide

  74. Benchmarks
    func BenchmarkGet(b *testing.B) {
    cache := NewLFUCache(100)
    keys := make([]string, 100)
    items := make(map[string]string)
    for i := 0; i < 100; i++ {
    kv := fmt.Sprint(i)
    keys[i] = kv
    items[kv] = kv
    }
    cache.Set(items)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
    cache.Get(keys)
    }
    }

    View full-size slide

  75. go test -run=^$ -bench=BenchmarkGet
    Benchmarks

    View full-size slide

  76. go test -run=^$ -bench=BenchmarkGet -count=5
    Benchmarks

    View full-size slide

  77. go test -run=^$ -bench=BenchmarkGet -count=5
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    BenchmarkGet-16 117642 9994 ns/op
    BenchmarkGet-16 116402 10018 ns/op
    BenchmarkGet-16 121834 9817 ns/op
    BenchmarkGet-16 123241 9942 ns/op
    BenchmarkGet-16 109621 10022 ns/op
    PASS
    ok github.com/konradreiche/cache 6.520s
    Benchmarks

    View full-size slide

  78. Benchmarks
    func BenchmarkGet(b *testing.B) {
    cache := NewLFUCache(100)
    keys := make([]string, 100)
    items := make(map[string]string)
    for i := 0; i < 100; i++ {
    kv := fmt.Sprint(i)
    keys[i] = kv
    items[kv] = kv
    }
    cache.Set(items)
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
    cache.Get(keys)
    }
    }

    View full-size slide

  79. Limitations
    ● We want to analyze and optimize all cache operations: Get, Set, Eviction
    ● Not all code paths are covered
    ● What cache hit and miss ratio to benchmark for?
    ● No concurrency, benchmark executes with one goroutine
    ● How do different operations behave when interleaving concurrently?

    View full-size slide

  80. Limitations
    ● We want to analyze and optimize all cache operations: Get, Set, Eviction
    ● Not all code paths are covered
    ● What cache hit and miss ratio to benchmark for?
    ● No concurrency, benchmark executes with one goroutine
    ● How do different operations behave when interleaving concurrently?

    View full-size slide

  81. Real Sample Data
    Event log of cache access over 30 minutes including:
    ● timestamp
    ● posts keys
    107907533,SA,Lw,OA,Iw,aA,RA,KA,CQ,Ow,Aw,Hg,Kg
    111956832,upgb
    121807061,upgb
    134028958,l3Ir,iPMq,PcUn,T5Ej,ZQs,kTM,/98F,BFwJ,Oik,uYIB,gv8F
    137975373,crgb,SCMU,NXUd,EyQI,244Z,DB4H,Tp0H,Kh8b,gREH,g9kG,o34E,wSYI,u+wF,h40M
    142509895,iwM,hgM,CQQ,YQI
    154850130,jTE,ciU,2U4,GQkB,4xo,U2QC,/7oB,dRIC,M0gB,bwYk
    ...

    View full-size slide

  82. Limitations
    ● We want to analyze and optimize all cache operations: Get, Set, Eviction
    ● Not all code paths are covered
    ● What cache hit and miss ratio to benchmark for?
    ● No concurrency, benchmark executes with one goroutine
    ● How do different operations behave when interleaving concurrently?

    View full-size slide

  83. b.RunParallel
    RunParallel runs a benchmark in parallel. It creates multiple goroutines and
    distributes b.N iterations among them. The number of goroutines defaults to
    GOMAXPROCS.

    View full-size slide

  84. b.RunParallel
    RunParallel runs a benchmark in parallel. It creates multiple goroutines and
    distributes b.N iterations among them. The number of goroutines defaults to
    GOMAXPROCS.
    func BenchmarkCache(b *testing.B) {
    b.RunParallel(func(pb *testing.PB) {
    // set up goroutine local state
    for pb.Next() {
    // execute one iteration of the benchmark
    }
    })
    }

    View full-size slide

  85. func BenchmarkCache(b *testing.B) {
    cb := newBenchmarkCase(b, config{size: 400_000})
    b.ResetTimer()
    b.RunParallel(func(pb *testing.PB) {
    var br benchmarkResult
    for pb.Next() {
    log := cb.nextEventLog()
    start := time.Now()
    cached, missing := cb.cache.Get(log.keys)
    br.observeGet(start, cached, missing)
    if len(missing) > 0 {
    data := lookupData(missing)
    start := time.Now()
    cb.cache.Set(data)
    br.observeSetDuration(start)
    }
    }
    cb.addLocalReports(br)
    })
    b.ReportMetric(cb.getHitRate(), "hit/op")
    b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
    b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
    }

    View full-size slide

  86. func BenchmarkCache(b *testing.B) {
    cb := newBenchmarkCase(b, config{size: 400_000})
    b.ResetTimer()
    b.RunParallel(func(pb *testing.PB) {
    var br benchmarkResult
    for pb.Next() {
    log := cb.nextEventLog()
    start := time.Now()
    cached, missing := cb.cache.Get(log.keys)
    br.observeGet(start, cached, missing)
    if len(missing) > 0 {
    data := lookupData(missing)
    start := time.Now()
    cb.cache.Set(data)
    br.observeSetDuration(start)
    }
    }
    cb.addLocalReports(br)
    })
    b.ReportMetric(cb.getHitRate(), "hit/op")
    b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
    b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
    }
    Custom benchmark case
    type to manage benchmark
    and collect data

    View full-size slide

  87. func BenchmarkCache(b *testing.B) {
    cb := newBenchmarkCase(b, config{size: 400_000})
    b.ResetTimer()
    b.RunParallel(func(pb *testing.PB) {
    var br benchmarkResult
    for pb.Next() {
    log := cb.nextEventLog()
    start := time.Now()
    cached, missing := cb.cache.Get(log.keys)
    br.observeGet(start, cached, missing)
    if len(missing) > 0 {
    data := lookupData(missing)
    start := time.Now()
    cb.cache.Set(data)
    br.observeSetDuration(start)
    }
    }
    cb.addLocalReports(br)
    })
    b.ReportMetric(cb.getHitRate(), "hit/op")
    b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
    b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
    }
    Collect per-goroutine
    benchmark measurements

    View full-size slide

  88. func BenchmarkCache(b *testing.B) {
    cb := newBenchmarkCase(b, config{size: 400_000})
    b.ResetTimer()
    b.RunParallel(func(pb *testing.PB) {
    var br benchmarkResult
    for pb.Next() {
    log := cb.nextEventLog()
    start := time.Now()
    cached, missing := cb.cache.Get(log.keys)
    br.observeGet(start, cached, missing)
    if len(missing) > 0 {
    data := lookupData(missing)
    start := time.Now()
    cb.cache.Set(data)
    br.observeSetDuration(start)
    }
    }
    cb.addLocalReports(br)
    })
    b.ReportMetric(cb.getHitRate(), "hit/op")
    b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
    b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
    }
    Reproduce production behavior:
    lookup & update.

    View full-size slide

  89. func BenchmarkCache(b *testing.B) {
    cb := newBenchmarkCase(b, config{size: 400_000})
    b.ResetTimer()
    b.RunParallel(func(pb *testing.PB) {
    var br benchmarkResult
    for pb.Next() {
    log := cb.nextEventLog()
    start := time.Now()
    cached, missing := cb.cache.Get(log.keys)
    br.observeGet(start, cached, missing)
    if len(missing) > 0 {
    data := lookupData(missing)
    start := time.Now()
    cb.cache.Set(data)
    br.observeSetDuration(start)
    }
    }
    cb.addLocalReports(br)
    })
    b.ReportMetric(cb.getHitRate(), "hit/op")
    b.ReportMetric(cb.getTimePerGet(b), "read-ns/op")
    b.ReportMetric(cb.getTimePerSet(b), "write-ns/op")
    }
    We can measure duration of
    individual operations
    manually
    Use b.ReportMetric to
    report custom metrics

    View full-size slide

  90. go test -run=^$ -bench=BenchmarkCache -count=10

    View full-size slide

  91. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x
    Ensure that each benchmark
    processes exactly 5,000 event
    logs to improve comparability
    of hit rate metric

    View full-size slide

  92. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    BenchmarkCache/policy=lfu-16 5000 0.6141 hit/op 4795215 read-ns/op 2964262 write-ns/op
    BenchmarkCache/policy=lfu-16 5000 0.6082 hit/op 4686778 read-ns/op 2270200 write-ns/op
    BenchmarkCache/policy=lfu-16 5000 0.6159 hit/op 4332358 read-ns/op 1765885 write-ns/op
    BenchmarkCache/policy=lfu-16 5000 0.6153 hit/op 4089562 read-ns/op 2504176 write-ns/op
    BenchmarkCache/policy=lfu-16 5000 0.6152 hit/op 3472677 read-ns/op 1686928 write-ns/op
    BenchmarkCache/policy=lfu-16 5000 0.6107 hit/op 4464410 read-ns/op 2695443 write-ns/op
    BenchmarkCache/policy=lfu-16 5000 0.6155 hit/op 3624802 read-ns/op 1837148 write-ns/op
    BenchmarkCache/policy=lfu-16 5000 0.6133 hit/op 3931610 read-ns/op 2154571 write-ns/op
    BenchmarkCache/policy=lfu-16 5000 0.6151 hit/op 2440746 read-ns/op 1260662 write-ns/op
    BenchmarkCache/policy=lfu-16 5000 0.6138 hit/op 3491091 read-ns/op 1944350 write-ns/op
    BenchmarkCache/policy=lru-16 5000 0.6703 hit/op 2320270 read-ns/op 1127495 write-ns/op
    BenchmarkCache/policy=lru-16 5000 0.6712 hit/op 2212118 read-ns/op 1019305 write-ns/op
    BenchmarkCache/policy=lru-16 5000 0.6705 hit/op 2150089 read-ns/op 1037654 write-ns/op
    BenchmarkCache/policy=lru-16 5000 0.6703 hit/op 2512224 read-ns/op 1134282 write-ns/op
    BenchmarkCache/policy=lru-16 5000 0.6710 hit/op 2377883 read-ns/op 1079198 write-ns/op
    BenchmarkCache/policy=lru-16 5000 0.6711 hit/op 2313210 read-ns/op 1120761 write-ns/op
    BenchmarkCache/policy=lru-16 5000 0.6712 hit/op 2071632 read-ns/op 980912 write-ns/op
    BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2410096 read-ns/op 1127907 write-ns/op
    BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2226160 read-ns/op 1071007 write-ns/op
    BenchmarkCache/policy=lru-16 5000 0.6709 hit/op 2383321 read-ns/op 1165734 write-ns/op
    PASS
    ok github.com/konradreiche/cache 846.442s

    View full-size slide

  93. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench

    View full-size slide

  94. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /policy bench

    View full-size slide

  95. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /policy bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ LFU │ LRU │
    │ hit/op │ hit/op vs base │
    Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10)
    │ LFU │ LRU │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10)
    │ LFU │ LRU │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)

    View full-size slide

  96. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /policy bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ LFU │ LRU │
    │ hit/op │ hit/op vs base │
    Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10)
    │ LFU │ LRU │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10)
    │ LFU │ LRU │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)

    View full-size slide

  97. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /policy bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ LFU │ LRU │
    │ hit/op │ hit/op vs base │
    Cache-16 61.46% ± 1% 67.09% ± 0% +9.16% (p=0.000 n=10)
    │ LFU │ LRU │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 4.01ms ± 17% 2.32ms ± 7% -42.23% (p=0.000 n=10)
    │ LFU │ LRU │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 2.05ms ± 32% 1.10ms ± 7% -46.33% (p=0.000 n=10)

    View full-size slide

  98. 100
    Taxonomy of Cache Replacement Policies
    Building a Highly Concurrent Cache in Go
    Coarse-Grained
    Policies
    Fine-Grained
    Policies
    Recency Frequency Hybrid Economic Value Reuse Distance Classification
    [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
    LRU
    EELRU
    SegLRU
    LIP
    SRRIP
    LFU
    FBR
    2Q
    ARC
    LRFU
    DIP
    DRRIP
    Timekeeping
    AIP
    ETA
    Leeway
    DBCP
    EAF
    SDBP
    SHiP
    Hawkeye
    EVA

    View full-size slide

  99. 101
    Taxonomy of Cache Replacement Policies
    Building a Highly Concurrent Cache in Go
    Coarse-Grained
    Policies
    Fine-Grained
    Policies
    Recency Frequency Hybrid Economic Value Reuse Distance Classification
    [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
    LRU
    EELRU
    SegLRU
    LIP
    SRRIP
    LFU
    FBR
    2Q
    ARC
    LRFU
    DIP
    DRRIP
    Timekeeping
    AIP
    ETA
    Leeway
    DBCP
    EAF
    SDBP
    SHiP
    Hawkeye
    EVA

    View full-size slide

  100. 102
    Taxonomy of Cache Replacement Policies
    Building a Highly Concurrent Cache in Go
    Coarse-Grained
    Policies
    Fine-Grained
    Policies
    Recency Frequency Hybrid Economic Value Reuse Distance Classification
    [1] Akanksha Jain, Calvin Lin. Cache Replacement Policies. Morgan & Claypool Publishers, July 2019.
    LRU
    EELRU
    SegLRU
    LIP
    SRRIP
    LFU
    FBR
    2Q
    ARC
    LRFU
    DIP
    DRRIP
    Timekeeping
    AIP
    ETA
    Leeway
    DBCP
    EAF
    SDBP
    SHiP
    Hawkeye
    EVA

    View full-size slide

  101. Combining
    LFU & LRU

    View full-size slide

  102. 104 Building a Highly Concurrent Cache in Go
    LRFU (Least Recently/Least Frequently)
    A paper [2] published in 2001 suggests to combine
    LRU and LFU named LRFU.
    ● Similar to LFU: each item holds a value
    ● CRF: Combined Recency and Frequency
    ● A parameter λ determines how much weight is
    given to recent entries
    type cache struct {
    size int
    mu sync.Mutex
    data map[string]*item
    heap *MinHeap
    }
    type item struct {
    key string
    value string
    index int
    frequency int
    }
    [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
    IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

    View full-size slide

  103. A paper [2] published in 2001 suggests to combine
    LRU and LFU named LRFU.
    ● Similar to LFU: each item holds a value
    ● CRF: Combined Recency and Frequency
    ● A parameter λ determines how much weight is
    given to recent entries
    λ = 1.0 (LRU)
    λ = 0.0 (LFU)
    105 Building a Highly Concurrent Cache in Go
    LRFU (Least Recently/Least Frequently)
    type cache struct {
    size int
    weight float64
    mu sync.Mutex
    data map[string]*item
    heap *MinHeap
    }
    type item struct {
    key string
    value string
    index int
    crf float64
    }
    [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
    IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

    View full-size slide

  104. A paper [2] published in 2001 suggests to combine
    LRU and LFU named LRFU.
    ● Similar to LFU: each item holds a value
    ● CRF: Combined Recency and Frequency
    ● A parameter λ determines how much weight is
    given to recent entries
    λ = 1.0 (LRU)
    λ = 0.0 (LFU)
    λ = 0.001
    106 Building a Highly Concurrent Cache in Go
    LRFU (Least Recently/Least Frequently)
    type cache struct {
    size int
    weight float64
    mu sync.Mutex
    data map[string]*item
    heap *MinHeap
    }
    type item struct {
    key string
    value string
    index int
    crf float64
    }
    [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
    IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

    View full-size slide

  105. A paper [2] published in 2001 suggests to combine
    LRU and LFU named LRFU.
    ● Similar to LFU: each item holds a value
    ● CRF: Combined Recency and Frequency
    ● A parameter λ determines how much weight is
    given to recent entries
    λ = 1.0 (LRU)
    λ = 0.0 (LFU)
    λ = 0.001 (LFU with a pinch of LRU)
    107 Building a Highly Concurrent Cache in Go
    LRFU (Least Recently/Least Frequently)
    type cache struct {
    size int
    weight float64
    mu sync.Mutex
    data map[string]*item
    heap *MinHeap
    }
    type item struct {
    key string
    value string
    index int
    crf float64
    }
    [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
    IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

    View full-size slide

  106. A paper [2] published in 2001 suggests to combine
    LRU and LFU named LRFU.
    ● Calculate CRF for every entry whenever they need to
    be compared
    ● math.Pow not a cheap operation
    ● 0.5λx prone to floating-point overflow
    ● New items likely to be evicted starting with CRF = 1.0
    108 Building a Highly Concurrent Cache in Go
    LRFU (Least Recently/Least Frequently)
    type cache struct {
    size int
    weight float64
    mu sync.Mutex
    data map[string]*item
    heap *MinHeap
    }
    type item struct {
    key string
    value string
    index int
    crf float64
    }
    [2] Donghee Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies.
    IEEE Transactions on Computers, 50:12, 1352–1361, December 2001.

    View full-size slide

  107. 110 Building a Highly Concurrent Cache in Go
    DLFU (Decaying LFU Cache Expiry)
    Donovan Baarda, a Google SRE from Australia, came up with an improved
    algorithm [3] and a Python reference implementation [4], realizing:
    1. LRFU decay is a simple exponential decay
    2. Exponential decay can be approximated which eliminates math.Pow
    3. Exponentially grow the reference increment instead of exponentially decaying
    all entries, thus requiring fewer fields per entry and fewer comparisons
    [3] https://github.com/dbaarda/DLFUCache
    [4] https://minkirri.apana.org.au/wiki/DecayingLFUCacheExpiry

    View full-size slide

  108. 111 Building a Highly Concurrent Cache in Go
    func NewDLFUCache[V any](ctx context.Context, config config.Config) *DLFUCache[V] {
    cache := &DLFUCache[V]{
    data: make(map[string]*Item[V], len(config.Size)),
    heap: &MinHeap[V]{},
    weight: config.Weight,
    size: config.Size,
    incr: 1.0,
    }
    if config.Weight == 0.0 { // there is no decay for LFU policy
    cache.decay = 1
    }
    p := float64(config.Size) * config.Weight
    cache.decay = (p + 1.0) / p
    return cache
    }
    DLFU Cache

    View full-size slide

  109. 112 Building a Highly Concurrent Cache in Go
    func (c *lfuCache) Set(items map[string]string) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    for len(c.data) > c.size {
    item := heap.Pop(&c.heap).(*Item)
    delete(c.data, item.key)
    }
    }
    DLFU Cache

    View full-size slide

  110. 113 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    c.trim()
    }
    DLFU Cache

    View full-size slide

  111. 114 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    frequency: 1,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    c.trim()
    }
    DLFU Cache

    View full-size slide

  112. 115 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {
    c.mu.Lock()
    defer c.mu.Unlock()
    for key, value := range items {
    item := &item{
    key: key,
    value: value,
    score: c.incr,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    c.trim()
    }
    DLFU Cache

    View full-size slide

  113. 116 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {
    c.mu.Lock()
    defer c.mu.Unlock()
    expiresAt := time.Now().Add(expiry)
    for key, value := range items {
    if item, ok := c.data[key]; ok {
    item.value = value
    item.expiresAt = now.Add(expiry)
    continue
    }
    item := &item{
    key: key,
    value: value,
    score: c.incr,
    expiresAt: expiresAt,
    }
    c.data[key] = item
    heap.Push(&c.heap, item)
    }
    c.trim()
    }
    DLFU Cache

    View full-size slide

  114. 117 Building a Highly Concurrent Cache in Go
    func (c *lfuCache) Get(keys []string) (map[string]string, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, key := range keys {
    if item, ok := c.data[key]; ok {
    result[key] = value
    frequency := item.frequency+1
    c.heap.update(item, freqency)
    } else {
    missing = append(missing, key)
    }
    }
    return result, missing
    }
    DLFU Cache

    View full-size slide

  115. 118 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, key := range keys {
    if item, ok := c.data[key]; ok {
    result[key] = value
    frequency := item.frequency+1
    c.heap.update(item, freqency)
    } else {
    missing = append(missing, key)
    }
    }
    return result, missing
    }
    DLFU Cache

    View full-size slide

  116. 119 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, key := range keys {
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    frequency := item.frequency+1
    c.heap.update(item, freqency)
    } else {
    missing = append(missing, key)
    }
    }
    return result, missing
    }
    DLFU Cache

    View full-size slide

  117. 120 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, key := range keys {
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    c.heap.update(item, item.score)
    } else {
    missing = append(missing, key)
    }
    }
    return result, missing
    }
    DLFU Cache

    View full-size slide

  118. 121 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, key := range keys {
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    c.heap.update(item, item.score)
    } else {
    missing = append(missing, key)
    }
    }
    c.incr *= c.decay
    return result, missing
    }
    DLFU Cache

    View full-size slide

  119. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /policy bench

    View full-size slide

  120. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /policy bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ LRU │ DLFU │
    │ hit/op │ hit/op vs base │
    Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10)
    │ LRU │ DLFU │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10)
    │ LRU │ DLFU │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)

    View full-size slide

  121. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /policy bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ LRU │ DLFU │
    │ hit/op │ hit/op vs base │
    Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10)
    │ LRU │ DLFU │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10)
    │ LRU │ DLFU │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)

    View full-size slide

  122. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x -cpuprofile=cpu.out > bench
    benchstat -col /policy bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ LRU │ DLFU │
    │ hit/op │ hit/op vs base │
    Cache-16 67.11% ± 0% 70.89% ± 0% +5.63% (p=0.000 n=10)
    │ LRU │ DLFU │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 2.39ms ± 12% 3.74ms ± 11% +56.40% (p=0.000 n=10)
    │ LRU │ DLFU │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 1.19ms ± 16% 2.08ms ± 14% +75.69% (p=0.000 n=10)

    View full-size slide

  123. 126 Building a Highly Concurrent Cache in Go
    Profiling
    go tool pprof cpu.out
    File: cache.test
    Type: cpu
    Time: Sep 24, 2023 at 3:04pm (PDT)
    Duration: 850.60s, Total samples = 1092.33s (128.42%)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof)

    View full-size slide

  124. 127 Building a Highly Concurrent Cache in Go
    Profiling
    go tool pprof cpu.out
    File: cache.test
    Type: cpu
    Time: Sep 24, 2023 at 3:04pm (PDT)
    Duration: 850.60s, Total samples = 1092.33s (128.42%)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) top
    Showing nodes accounting for 567.44s, 51.95% of 1092.33s total
    Dropped 470 nodes (cum <= 5.46s)
    Showing top 10 nodes out of 104
    flat flat% sum% cum cum%
    118.69s 10.87% 10.87% 169.36s 15.50% runtime.findObject
    88.04s 8.06% 18.93% 88.04s 8.06% github.com/konradreiche/cache/dlfu/v1.MinHeap[go.shape.string].Less
    72.38s 6.63% 25.55% 319.24s 29.23% runtime.scanobject
    60.74s 5.56% 31.11% 106.72s 9.77% runtime.mapaccess2_faststr
    45.03s 4.12% 35.23% 126.25s 11.56% runtime.mapassign_faststr
    40.60s 3.72% 38.95% 40.60s 3.72% time.Now
    40.20s 3.68% 42.63% 41.13s 3.77% container/list.(*List).move
    35.47s 3.25% 45.88% 35.47s 3.25% memeqbody
    34.25s 3.14% 49.01% 34.25s 3.14% runtime.memclrNoHeapPointers
    32.04s 2.93% 51.95% 44.34s 4.06% runtime.mapdelete_faststr

    View full-size slide

  125. 128 Building a Highly Concurrent Cache in Go
    Profiling
    go tool pprof cpu.out
    File: cache.test
    Type: cpu
    Time: Sep 24, 2023 at 3:04pm (PDT)
    Duration: 850.60s, Total samples = 1092.33s (128.42%)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) list DLFU.*Get

    View full-size slide

  126. 129 Building a Highly Concurrent Cache in Go
    Total: 1092.33s
    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
    6.96s 218.27s (flat, cum) 19.98% of Total
    . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
    . . 75: var missingKeys []string
    . 490ms 76: result := make(map[string]V)
    . . 77:
    40ms 2.56s 78: c.mu.Lock()
    30ms 30ms 79: defer c.mu.Unlock()
    . . 80:
    1.04s 1.04s 81: for _, key := range keys {
    1.34s 43.77s 82: item, ok := c.data[key]
    290ms 53.66s 83: if ok && !item.expired() {
    1.68s 44.35s 84: result[key] = item.value
    1.72s 1.72s 85: item.score += c.incr
    130ms 65.82s 86: c.heap.update(item, item.score)
    . . 87: } else {
    530ms 3.55s 88: missingKeys = append(missingKeys, key)
    . . 89: }
    . . 90: }
    20ms 20ms 91: c.incr *= c.decay
    140ms 1.26s 92: return result, missingKeys
    . . 93:}
    . . 94:

    View full-size slide

  127. 130 Building a Highly Concurrent Cache in Go
    Total: 1092.33s
    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
    6.96s 218.27s (flat, cum) 19.98% of Total
    . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
    . . 75: var missingKeys []string
    . 490ms 76: result := make(map[string]V)
    . . 77:
    40ms 2.56s 78: c.mu.Lock()
    30ms 30ms 79: defer c.mu.Unlock()
    . . 80:
    1.04s 1.04s 81: for _, key := range keys {
    1.34s 43.77s 82: item, ok := c.data[key]
    290ms 53.66s 83: if ok && !item.expired() {
    1.68s 44.35s 84: result[key] = item.value
    1.72s 1.72s 85: item.score += c.incr
    130ms 65.82s 86: c.heap.update(item, item.score)
    . . 87: } else {
    530ms 3.55s 88: missingKeys = append(missingKeys, key)
    . . 89: }
    . . 90: }
    20ms 20ms 91: c.incr *= c.decay
    140ms 1.26s 92: return result, missingKeys
    . . 93:}
    . . 94:
    Maintaining a heap is more
    expensive than LRU, which only
    requires a doubly linked list.

    View full-size slide

  128. 131 Building a Highly Concurrent Cache in Go
    Total: 1092.33s
    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
    6.96s 218.27s (flat, cum) 19.98% of Total
    . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
    . . 75: var missingKeys []string
    . 490ms 76: result := make(map[string]V)
    . . 77:
    40ms 2.56s 78: c.mu.Lock()
    30ms 30ms 79: defer c.mu.Unlock()
    . . 80:
    1.04s 1.04s 81: for _, key := range keys {
    1.34s 43.77s 82: item, ok := c.data[key]
    290ms 53.66s 83: if ok && !item.expired() {
    1.68s 44.35s 84: result[key] = item.value
    1.72s 1.72s 85: item.score += c.incr
    130ms 65.82s 86: c.heap.update(item, item.score)
    . . 87: } else {
    530ms 3.55s 88: missingKeys = append(missingKeys, key)
    . . 89: }
    . . 90: }
    20ms 20ms 91: c.incr *= c.decay
    140ms 1.26s 92: return result, missingKeys
    . . 93:}
    . . 94:
    The CPU profile does not
    capture the time spent waiting
    to acquire a lock.

    View full-size slide

  129. go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out

    View full-size slide

  130. go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out
    File: cache.test
    Type: delay
    Time: Sep 24, 2023 at 3:48pm (PDT)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof)

    View full-size slide

  131. go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out
    File: cache.test
    Type: delay
    Time: Sep 24, 2023 at 3:48pm (PDT)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) list DLFU.*Get
    Total: 615.48s
    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
    0 297.55s (flat, cum) 48.34% of Total
    . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
    . . 75: var missingKeys []string
    . . 76: result := make(map[string]V)
    . . 77:
    . 297.55s 78: c.mu.Lock()
    . . 79: defer c.mu.Unlock()
    . . 80:
    . . 81: for _, key := range keys {
    . . 82: item, ok := c.data[key]
    . . 83: if ok && !item.expired() {

    View full-size slide

  132. go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out
    File: cache.test
    Type: delay
    Time: Sep 24, 2023 at 3:48pm (PDT)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) list DLFU.*Get
    Total: 615.48s
    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Get
    0 297.55s (flat, cum) 48.34% of Total
    . . 74:func (c *DLFUCache[V]) Get(keys []string) (map[string]V, []string) {
    . . 75: var missingKeys []string
    . . 76: result := make(map[string]V)
    . . 77:
    . 297.55s 78: c.mu.Lock()
    . . 79: defer c.mu.Unlock()
    . . 80:
    . . 81: for _, key := range keys {
    . . 82: item, ok := c.data[key]
    . . 83: if ok && !item.expired() {

    View full-size slide

  133. go test -run=^$ -bench=BenchmarkCache/policy=dlfu -benchtime=5000x -blockprofile=block.out
    File: cache.test
    Type: delay
    Time: Sep 24, 2023 at 3:48pm (PDT)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) list DLFU.*Set
    Total: 615.48s
    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v1.(*DLFUCache[go.shape.string]).Set
    0 193.89s (flat, cum) 31.50% of Total
    . . 99:func (c *DLFUCache[V]) Set(items map[string]V, expiry time.Duration) {
    . 193.89s 100: c.mu.Lock()
    . . 101: defer c.mu.Unlock()
    . . 102:
    . . 103: now := time.Now()
    . . 104: for key, value := range items {
    . . 105: if ctx.Err() != nil {

    View full-size slide

  134. 137 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, key := range keys {
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    c.heap.update(item, item.score)
    } else {
    missing = append(missing, key)
    }
    }
    c.incr *= c.decay
    return result, missing
    }
    Critical Section
    Critical Section

    View full-size slide

  135. 138 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    c.mu.Lock()
    defer c.mu.Unlock()
    for _, key := range keys {
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    c.heap.update(item, item.score)
    } else {
    missing = append(missing, key)
    }
    }
    c.incr *= c.decay
    return result, missing
    }

    View full-size slide

  136. 139 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    for _, key := range keys {
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    c.heap.update(item, item.score)
    } else {
    missing = append(missing, key)
    }
    }
    c.incr *= c.decay
    return result, missing
    }

    View full-size slide

  137. 140 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    for _, key := range keys {
    c.mu.Lock()
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    c.heap.update(item, item.score)
    } else {
    missing = append(missing, key)
    }
    c.mu.Unlock()
    }
    c.mu.Lock()
    c.incr *= c.decay
    c.mu.Unlock()
    return result, missing
    }

    View full-size slide

  138. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /dlfu bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ V1 │ V2 │
    │ hit/op │ hit/op vs base │
    Cache-16 70.94% ± 0% 70.30% ± 0% -0.89% (p=0.000 n=10)
    │ V1 │ V2 │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 4.34ms ± 54% 1.43ms ± 26% -67.13% (p=0.001 n=10)
    │ V1 │ V2 │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 2.43ms ± 62% 574.3µs ± 25% -76.36% (p=0.000 n=10)

    View full-size slide

  139. In Production

    View full-size slide

  140. In Production

    View full-size slide

  141. ● Spike in number of goroutines, memory usage & timeouts
    In Production

    View full-size slide

  142. ● Spike in number of goroutines, memory usage & timeouts
    ● Latency added to call paths integrating the local in-memory
    cache
    In Production

    View full-size slide

  143. ● Spike in number of goroutines, memory usage & timeouts
    ● Latency added to call paths integrating the local in-memory
    cache
    ● For incremental progress:
    ○ Feature Flags with Sampling
    ○ Timeout for Cache Operations
    In Production

    View full-size slide

  144. cached, missingIDs := cache.Get(keys)
    In Production

    View full-size slide

  145. if !liveconfig.Sample("cache.read_rate") {
    return
    }
    cached, missingIDs := cache.Get(keys)
    In Production: Feature Flags with Sampling

    View full-size slide

  146. if !liveconfig.Sample("cache.read_rate") {
    return
    }
    go func() {
    cached, missingIDs = localCache.Get(keys)
    }()
    In Production: Timeout for Cache Operations

    View full-size slide

  147. if !liveconfig.Sample("cache.read_rate") {
    return
    }
    // perform cache-lookup in goroutine to avoid blocking for too long
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Millisecond)
    go func() {
    cached, missingIDs = localCache.Get(keys)
    cancel()
    }()
    <-ctx.Done()
    // timeout: return all keys as missing and let remote cache handle it
    if ctxCache.Err() == context.DeadlineExceeded {
    return map[string]T{}, keys
    }
    In Production: Timeout for Cache Operations

    View full-size slide

  148. if !liveconfig.Sample("cache.read_rate") {
    return
    }
    // perform cache-lookup in goroutine to avoid blocking for too long
    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Millisecond)
    go func() {
    cached, missingIDs = localCache.Get(keys)
    cancel()
    }()
    <-ctx.Done()
    // timeout: return all keys as missing and let remote cache handle it
    if ctxCache.Err() == context.DeadlineExceeded {
    return map[string]T{}, keys
    }
    In Production: Timeout for Cache Operations
    Pass context into the cache
    operations too.

    View full-size slide

  149. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    for _, key := range keys {
    c.mu.Lock()
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    c.heap.update(item, item.score)
    } else {
    missing = append(missing, key)
    }
    c.mu.Unlock()
    }
    c.mu.Lock()
    c.incr *= c.decay
    c.mu.Unlock()
    return result, missing
    }
    In Production: Timeout for Cache Operations

    View full-size slide

  150. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    for _, key := range keys {
    c.mu.Lock()
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    c.heap.update(item, item.score)
    } else {
    missing = append(missing, key)
    }
    c.mu.Unlock()
    }
    c.mu.Lock()
    c.incr *= c.decay
    c.mu.Unlock()
    return result, missing
    }
    In Production: Timeout for Cache Operations

    View full-size slide

  151. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    for _, key := range keys {
    c.mu.Lock()
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    c.heap.update(item, item.score)
    } else {
    missing = append(missing, key)
    }
    c.mu.Unlock()
    }
    c.mu.Lock()
    c.incr *= c.decay
    c.mu.Unlock()
    return result, missing
    }
    In Production: Timeout for Cache Operations
    The goroutine gets abandoned
    after the timeout. Checking for
    context cancellation to stop
    iteration helps to reduce lock
    contention.

    View full-size slide

  152. Beyond
    sync.Mutex

    View full-size slide

  153. 156
    sync.Map
    Building a Highly Concurrent Cache in Go

    View full-size slide

  154. 157
    sync.Map
    ● I wrongly assumed sync.Map is an untyped map proteced by a sync.RWMutex.
    ● I recommend to frequently dive into the standard library source code, for
    example for sync.Map we can see this implementation is much more intricate:
    Building a Highly Concurrent Cache in Go
    func (m *Map) Load(key any) (value any, ok bool) {
    read := m.loadReadOnly()
    e, ok := read.m[key]
    if !ok && read.amended {
    m.mu.Lock()
    // Avoid reporting a spurious miss if m.dirty got promoted while we were
    // blocked on m.mu. (If further loads of the same key will not miss, it's
    // not worth copying the dirty map for this key.)
    read = m.loadReadOnly()
    e, ok = read.m[key]

    View full-size slide

  155. 158
    sync.Map
    Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple
    goroutines without additional locking or coordination. Loads, stores, and deletes run in
    amortized constant time.
    The Map type is specialized. Most code should use a plain Go map instead, with separate
    locking or coordination, for better type safety and to make it easier to maintain other
    invariants along with the map content.
    The Map type is optimized for two common use cases:
    (1) when the entry for a given key is only ever written once but read many times, as in caches
    that only grow, or
    (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In
    these two cases, use of a Map may significantly reduce lock contention compared to a Go
    map paired with a separate Mutex or RWMutex.
    Building a Highly Concurrent Cache in Go

    View full-size slide

  156. 159
    sync.Map
    Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple
    goroutines without additional locking or coordination. Loads, stores, and deletes run in
    amortized constant time.
    The Map type is specialized. Most code should use a plain Go map instead, with separate
    locking or coordination, for better type safety and to make it easier to maintain other
    invariants along with the map content.
    The Map type is optimized for two common use cases:
    (1) when the entry for a given key is only ever written once but read many times, as in caches
    that only grow, or
    (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In
    these two cases, use of a Map may significantly reduce lock contention compared to a Go
    map paired with a separate Mutex or RWMutex.
    Building a Highly Concurrent Cache in Go

    View full-size slide

  157. 160
    sync.Map
    Map is like a Go map[interface{}]interface{} but is safe for concurrent use by multiple
    goroutines without additional locking or coordination. Loads, stores, and deletes run in
    amortized constant time.
    The Map type is specialized. Most code should use a plain Go map instead, with separate
    locking or coordination, for better type safety and to make it easier to maintain other
    invariants along with the map content.
    The Map type is optimized for two common use cases:
    (1) when the entry for a given key is only ever written once but read many times, as in caches
    that only grow, or
    (2) when multiple goroutines read, write, and overwrite entries for disjoint sets of keys. In
    these two cases, use of a Map may significantly reduce lock contention compared to a Go
    map paired with a separate Mutex or RWMutex.
    Building a Highly Concurrent Cache in Go

    View full-size slide

  158. 161
    xsync.Map
    ● Third-party library providing concurrent data structures for Go
    https://github.com/puzpuzpuz/xsync
    ● xsync.Map is a concurrent hash table based map using a
    modified version of Cache-Line Hash Table (CLHT) data
    structure.
    ● CLHT organizes the hash table in cache-line-sized buckets to
    reduce the number of cache-line transfers.
    Building a Highly Concurrent Cache in Go

    View full-size slide

  159. 162
    xsync.Map
    ● Third-party library providing concurrent data structures for Go
    https://github.com/puzpuzpuz/xsync
    ● xsync.Map is a concurrent hash table based map using a
    modified version of Cache-Line Hash Table (CLHT) data
    structure.
    ● CLHT organizes the hash table in cache-line-sized buckets to
    reduce the number of cache-line transfers.
    Building a Highly Concurrent Cache in Go

    View full-size slide

  160. 163
    Symmetric Multiprocessing (SMP)
    CPU
    Core
    L1
    Core
    L1
    Core
    L1
    Core
    L1
    L3
    Main Memory (RAM)
    L2 L2 L2 L2
    Building a Highly Concurrent Cache in Go

    View full-size slide

  161. 164
    Locality of Reference
    Building a Highly Concurrent Cache in Go
    ● Temporal Locality: a processor accessing a particular memory location will
    likely access it again in the near future.
    ● Spatial Locality: a processing accessing a particular memory location will
    access memory locations nearby.
    ● Not one memory location is copied to the CPU cache, but a cache line.
    ● Cache Line (Cache Block): adjacent chunk of memory.

    View full-size slide

  162. 165
    False Sharing
    CPU Core 1
    L1
    CPU Core 2
    L1
    Main Memory
    var x var y
    Building a Highly Concurrent Cache in Go

    View full-size slide

  163. 166
    False Sharing
    CPU Core 1
    L1
    CPU Core 2
    L1
    Main Memory
    var x var y
    Building a Highly Concurrent Cache in Go
    Read variable x into cache

    View full-size slide

  164. 167
    False Sharing
    Main Memory
    CPU Core 1
    L1
    CPU Core 2
    L1
    var x var y
    var x var y
    Building a Highly Concurrent Cache in Go
    E

    View full-size slide

  165. 168
    False Sharing
    Main Memory
    CPU Core 2
    L1
    var x var y
    Building a Highly Concurrent Cache in Go
    CPU Core 1
    L1
    var x var y
    E
    Read variable y into cache

    View full-size slide

  166. 169
    False Sharing
    Building a Highly Concurrent Cache in Go
    CPU Core 2
    L1
    Main Memory
    var x var y
    CPU Core 1
    L1
    var y
    var x
    S
    var x var y
    S

    View full-size slide

  167. 170
    False Sharing
    Main Memory
    CPU Core 2
    L1
    var x var y
    Building a Highly Concurrent Cache in Go
    CPU Core 1
    L1
    var x var y
    M
    var y
    var x
    S
    Write to variable x

    View full-size slide

  168. 171
    False Sharing
    Main Memory
    CPU Core 2
    L1
    var x var y
    Building a Highly Concurrent Cache in Go
    CPU Core 1
    L1
    var x var y
    M
    var y
    var x
    S
    Invalidate cache line

    View full-size slide

  169. 172
    False Sharing
    Building a Highly Concurrent Cache in Go
    Main Memory
    var x var y
    CPU Core 1
    L1
    var x var y
    M
    CPU Core 2
    L1

    View full-size slide

  170. 173
    False Sharing
    Main Memory
    var x var y
    Building a Highly Concurrent Cache in Go
    CPU Core 1
    L1
    var x var y
    M
    CPU Core 2
    L1
    Write results in coherence miss
    Invalidate cache line

    View full-size slide

  171. 174
    False Sharing
    Building a Highly Concurrent Cache in Go
    CPU Core 1
    L1
    Main Memory
    var x var y
    CPU Core 2
    L1
    var y
    var x
    M
    Coherence Write-Back

    View full-size slide

  172. 175
    False Sharing
    Building a Highly Concurrent Cache in Go
    CPU Core 1
    L1
    Main Memory
    var x var y
    CPU Core 2
    L1
    var y
    var x
    M

    View full-size slide

  173. 176
    False Sharing
    Building a Highly Concurrent Cache in Go
    Read results in coherence miss
    CPU Core 1
    L1
    Main Memory
    var x var y
    CPU Core 2
    L1
    var y
    var x
    M
    Invalidate cache line

    View full-size slide

  174. 177
    False Sharing
    CPU Core 2
    L1
    Main Memory
    var x var y
    Building a Highly Concurrent Cache in Go
    CPU Core 1
    L1
    var y
    var x
    M
    Coherence Write-Back

    View full-size slide

  175. 178
    False Sharing
    CPU Core 2
    L1
    Main Memory
    var x var y
    Building a Highly Concurrent Cache in Go
    CPU Core 1
    L1
    var y
    var x
    S
    var x var y
    S

    View full-size slide

  176. 179
    Cache Coherence Protocol
    Building a Highly Concurrent Cache in Go
    ● Ensure CPU cores have a consistent view of the same data.
    ● Added coordination between CPU cores impacts application performance.
    ● Reducing the need for cache coherence will make for faster Go applications.

    View full-size slide

  177. 180
    xsync.Map
    ● Third-party library providing concurrent data structures for Go
    https://github.com/puzpuzpuz/xsync
    ● xsync.Map is a concurrent hash table based map using a
    modified version of Cache-Line Hash Table (CLHT) data
    structure.
    ● CLHT organizes the hash table in cache-line-sized buckets to
    reduce the number of cache-line transfers.
    Building a Highly Concurrent Cache in Go

    View full-size slide

  178. 181
    DLFU Cache: Removing Locking
    ● Maintaining the heap still requires a mutex.
    ● To fully leverage xsync.Map we would want to eliminate the mutex.
    ● Perform cache eviction in a goroutine: collect all entries and sort them.
    ● Replace synchronized access to numeric or string types in Go with atomic
    operations.
    ● It’s not really lock-free: move locking closer to the CPU.
    Building a Highly Concurrent Cache in Go

    View full-size slide

  179. 182 Building a Highly Concurrent Cache in Go
    func (c *DLFUCache[V]) trimmer(ctx context.Context) {
    for {
    select {
    case <-ctx.Done():
    return
    case <-time.After(250 * time.Millisecond):
    if ctx.Err() != nil {
    return
    }
    c.trim()
    }
    }
    }
    Perform Cache Eviction Asynchronously

    View full-size slide

  180. func (c *DLFUCache[V]) trim() {
    size := c.data.Size()
    if size <= c.size {
    return
    }
    items := make(items[V], 0, size)
    c.data.Range(func(key string, value *item[V]) bool {
    items = append(items, value)
    return true
    })
    sort.Sort(items)
    for i := 0; i < len(items)-c.size; i++ {
    c.data.Delete(items[i].key.Load())
    }
    }
    183 Building a Highly Concurrent Cache in Go
    Perform Cache Eviction Asynchronously

    View full-size slide

  181. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    for _, key := range keys {
    if ctx.Err() != nil {
    return result, append(keys[i:], missingKeys...)
    }
    c.mu.Lock()
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    c.heap.update(item, item.score)
    } else {
    missing = append(missing, key)
    }
    c.mu.Unlock()
    }
    c.mu.Lock()
    c.incr *= c.decay
    l.mu.Unlock()
    return result, missing
    }
    Integrate xsync.Map

    View full-size slide

  182. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    for _, key := range keys {
    if ctx.Err() != nil {
    return result, append(keys[i:], missingKeys...)
    }
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    } else {
    missing = append(missing, key)
    }
    }
    c.incr *= c.decay
    return result, missing
    }
    Integrate xsync.Map

    View full-size slide

  183. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    for _, key := range keys {
    if ctx.Err() != nil {
    return result, append(keys[i:], missingKeys...)
    }
    if item, ok := c.data[key]; ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    } else {
    missing = append(missing, key)
    }
    }
    c.incr *= c.decay
    return result, missing
    }
    Integrate xsync.Map

    View full-size slide

  184. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    for _, key := range keys {
    if ctx.Err() != nil {
    return result, append(keys[i:], missingKeys...)
    }
    if item, ok := c.data.Load(key); ok && !item.expired() {
    result[key] = value
    item.score += c.incr
    } else {
    missing = append(missing, key)
    }
    }
    c.incr *= c.decay
    return result, missing
    }
    Integrate xsync.Map

    View full-size slide

  185. func (c *DLFUCache[V]) Get(ctx context.Context, keys []string) (map[string]V, []string) {
    result := make(map[string]string)
    missing := make([]string, 0)
    incr := c.incr.Load()
    for _, key := range keys {
    if ctx.Err() != nil {
    return result, append(keys[i:], missingKeys...)
    }
    if item, ok := c.data.Load(key); ok && !item.expired() {
    result[key] = value
    item.score.Add(incr)
    } else {
    missing = append(missing, key)
    }
    }
    c.incr.Store(incr * c.decay)
    return result, missing
    }
    Integrate xsync.Map

    View full-size slide

  186. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /dlfu bench

    View full-size slide

  187. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /dlfu bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ V2 │ V3 │
    │ hit/op │ hit/op vs base │
    Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10)
    │ V2 │ V3 │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10)
    │ V2 │ V3 │
    │ trim-sec/op │ trim-sec/op vs base │
    Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10)
    │ V2 │ V3 │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)

    View full-size slide

  188. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /dlfu bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ V2 │ V3 │
    │ hit/op │ hit/op vs base │
    Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10)
    │ V2 │ V3 │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10)
    │ V2 │ V3 │
    │ trim-sec/op │ trim-sec/op vs base │
    Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10)
    │ V2 │ V3 │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)

    View full-size slide

  189. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /dlfu bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ V2 │ V3 │
    │ hit/op │ hit/op vs base │
    Cache-16 72.32% ± 5% 76.27% ± 2% +5.45% (p=0.000 n=10)
    │ V2 │ V3 │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 68.7ms ± 3% 616.7µ ± 49% -99.10% (p=0.000 n=10)
    │ V2 │ V3 │
    │ trim-sec/op │ trim-sec/op vs base │
    Cache-16 246.4µ ± 3% 454.5 ms ± 61% +1843.61% (p=0.000 n=10)
    │ V2 │ V3 │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 28.47ms ± 13% 243.0µ ± 58% -99.15% (p=0.000 n=10)

    View full-size slide

  190. go tool pprof cpu.out
    File: cache.test
    Type: cpu
    Time: Sep 26, 2023 at 1:16pm (PDT)
    Duration: 42.14s, Total samples = 81.68s (193.83%)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof)

    View full-size slide

  191. go tool pprof cpu.out
    File: cache.test
    Type: cpu
    Time: Sep 26, 2023 at 1:16pm (PDT)
    Duration: 42.14s, Total samples = 81.68s (193.83%)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) list trim
    Total: 81.68s
    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v3.(*DLFUCache[go.shape.string]).trim
    20ms 37.23s (flat, cum) 45.58% of Total
    . . 197:func (c *DLFUCache[V]) Trim() {
    . 20ms 198: size := c.data.Size()
    . 10ms 199: if c.data.Size() <= c.size {
    . . 200: return
    . . 201: }
    . . 202:
    . 80ms 203: items := make(*items[V], 0, size)
    . 6.82s 204: c.data.Range(func(key string, value *item[V]) bool {
    . . 205: items = append(items, value)
    . . 206: return true
    . . 207: })
    . 26.98s 208: sort.Sort(items)
    . . 209:
    10ms 10ms 210: for i := 0; i < len(items)-c.size; i++ {
    10ms 680ms 211: key := items[i].key.Load()
    . 2.63s 212: c.data.Delete(key)
    . . 213: }
    . . 214:}

    View full-size slide

  192. go tool pprof cpu.out
    File: cache.test
    Type: cpu
    Time: Sep 26, 2023 at 1:16pm (PDT)
    Duration: 42.14s, Total samples = 81.68s (193.83%)
    Entering interactive mode (type "help" for commands, "o" for options)
    (pprof) list trim
    Total: 81.68s
    ROUTINE ======================== github.com/konradreiche/cache/dlfu/v3.(*DLFUCache[go.shape.string]).trim
    20ms 37.23s (flat, cum) 45.58% of Total
    . . 197:func (c *DLFUCache[V]) Trim() {
    . 20ms 198: size := c.data.Size()
    . 10ms 199: if c.data.Size() <= c.size {
    . . 200: return
    . . 201: }
    . . 202:
    . 80ms 203: items := make(*items[V], 0, size)
    . 6.82s 204: c.data.Range(func(key string, value *item[V]) bool {
    . . 205: items = append(items, value)
    . . 206: return true
    . . 207: })
    . 26.98s 208: sort.Sort(items)
    . . 209:
    10ms 10ms 210: for i := 0; i < len(items)-c.size; i++ {
    10ms 680ms 211: key := items[i].key.Load()
    . 2.63s 212: c.data.Delete(key)
    . . 213: }
    . . 214:}

    View full-size slide

  193. 196
    Faster Eviction
    Building a Highly Concurrent Cache in Go
    ● sort.Sort uses pattern-defeating quicksort (pdqsort)
    ● On the Gophers Slack in #performance Aurélien Rainone suggested to use
    quickselect.
    ● Quickselect is a linear algorithm to find the k-th smallest elements.

    View full-size slide

  194. go test -run=^$ -bench=BenchmarkCache -count=10 -benchtime=5000x > bench
    benchstat -col /dlfu bench
    goos: linux
    goarch: amd64
    pkg: github.com/konradreiche/cache
    cpu: 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz
    │ V4 │ V5 │
    │ hit/op │ hit/op vs base │
    Cache-16 76.44% ± 1% 74.84% ± 1% -2.09% (p=0.001 n=10)
    │ V4 │ V5 │
    │ read-sec/op │ read-sec/op vs base │
    Cache-16 477.5µ ± 40% 358.4µ ± 44% ~ (p=0.529 n=10)
    │ V4 │ V5 │
    │ trim-sec/op │ trim-sec/op vs base │
    Cache-16 463.3m ± 54% 129.1m ± 85% -72.14% (p=0.002 n=10)
    │ V4 │ V5 │
    │ write-sec/op │ write-sec/op vs base │
    Cache-16 193.2µ ± 53% 133.6µ ± 40% ~ (p=0.280 n=10)

    View full-size slide

  195. 198
    Summary
    ● Implementing your own cache in Go makes it possible to optimize by leveraging
    properties that are unique to your use case.
    ● Different cache replacement policies: LRU, LFU, DLFU, etc.
    ● DLFU (Decaying Least Frequently Used): like LFU but with exponential decay on the
    cache entry’s reference count.
    ● How to write benchmarks and utilize parallel execution for concurrency.
    ● Using Go’s profiler to optimize for concurrency contention.
    Building a Highly Concurrent Cache in Go

    View full-size slide

  196. 199
    Summary
    ● Cache coherency protocol can impact concurrent performance in Go
    applications
    ● There is no such thing as lock-free when multiple processors are involved.
    ● Performance can be improved with lock-free data structures and atomic
    primitives, but your mileage will differ.
    Building a Highly Concurrent Cache in Go

    View full-size slide



  197. Don’t generalize from the talk’s
    example. Write your own code,
    construct your own benchmarks.
    You will be surprised.
    200 Building a Highly Concurrent Cache in Go

    View full-size slide

  198. Thank you!
    Konrad Reiche
    @konradreiche

    View full-size slide