Building Ristretto: A High Performance, Concurrent, Memory Bound Go cache

Slide 1

Slide 1 text

Ristretto A High Performance, Concurrent, Memory Bound Go cache Building

Slide 2

Slide 2 text

Caffeine Caffeine is a high performance, near optimal caching library in Java Used by many DBs in Java Papers by author, Ben Manes https://github.com/ben-manes/caffeine

Slide 3

Slide 3 text

Today’s Talk Why build Ristretto How we got performance out of Cache Design and Go

Slide 4

Slide 4 text

The Cache Conundrum Needed a fast Go cache in Dgraph Using Go Groupcache LRU cache Improved query latency 10x by removing the cache!

Slide 5

Slide 5 text

Smart Caches Maintain global metadata for all Get and Set ops Even Gets write to the metadata Locks must be acquired, causes contention

Slide 6

Slide 6 text

Cache could be slowing down your system

Slide 7

Slide 7 text

Build a cache which degrades, but never causes contention Concurrent Memory-Bound Scale to Cores Scale to non-random Key Access High Cache Hit Ratio Requirements

Slide 8

Slide 8 text

Hit Ratio Hit = Served out of cache Miss = Cache fails to serve request Hit Ratio = Hits / Total Requests

Slide 9

Slide 9 text

State of Caching in Go https://blog.dgraph.io/post/caching-in-go/

Slide 10

Slide 10 text

What is Ristretto

Slide 11

Slide 11 text

Ristretto: My favorite coffee Normal amount of ground coffee + Half the water + Finer grind = Ristretto

Slide 12

Slide 12 text

Go Ristretto Ristretto is a high performance, concurrent, memory-bound Go cache High hit ratio High read-write throughput Scalable well as cores increase Contention Proof

Slide 13

Slide 13 text

Show the code func main() { // create a cache instance cache, err := ristretto.NewCache(&ristretto.Config{ NumCounters: 10 << 20, // 10M MaxCost: 1 << 30, // 1GB BufferItems: 64, }) if err != nil { log.Fatal(err) } cache.Set("key", "value", 5) // set a value time.Sleep(time.MilliSecond) // wait a bit value, found := cache.Get("key") if !found { panic("missing value") } fmt.Println(value) cache.Del("key") }

Slide 14

Slide 14 text

Mechanisms: Storage Java Concurrent Lockless Map Go Mutex + Map Sharded + Mutex + Map sync.Map

Slide 15

Slide 15 text

Storage: Map performance Gets (Zipfian) syncMap 15.5 ns/op 64.62 M/s lockSharded 26.1 ns/op 38.38 M/s lock 42.9 ns/op 23.32 M/s Sets (Zipfian) lockSharded 44.7 ns/op 22.38 M/s lock 79.6 ns/op 12.56 M/s syncMap 218 ns/op 4.58 M/s

Slide 16

Slide 16 text

Storage: Hashing Store hashes instead of keys Use hashes for load distribution BenchmarkFarm-32 100000000 17.5 ns/op BenchmarkSip-32 30000000 40.7 ns/op BenchmarkFnv-32 20000000 69.1 ns/op

Slide 17

Slide 17 text

Go Runtime Memhash //go:noescape //go:linkname memhash runtime.memhash func memhash(p unsafe.Pointer, h, s uintptr) uintptr func MemHash(data []byte) uint64 { ss := (*stringStruct)(unsafe.Pointer(&data)) return uint64(memhash(ss.str, 0, uintptr(ss.len))) }

Slide 18

Slide 18 text

Storage: Hashing with Memhash BenchmarkMemHash-32 200000000 9.69 ns/op BenchmarkFarm-32 100000000 17.5 ns/op BenchmarkSip-32 30000000 40.7 ns/op BenchmarkFnv-32 20000000 69.1 ns/op

Slide 19

Slide 19 text

Collisions Prevent collisions using two uint64 hashes internally With 128-bit hash: 820 Billion keys in cache ~ 10-15 Probability Fun Fact 10-15 = Uncorrectable bit error rate for HDD

Slide 20

Slide 20 text

Mechanism: Concurrency Aim to be Contention Proof

Slide 21

Slide 21 text

BP-Wrapper for Metadata Writes

Slide 22

Slide 22 text

How To: Ring Buffer

Slide 23

Slide 23 text

Fast Lossy Ring Buffer Lossy to avoid contention Drop metadata updates

Slide 24

Slide 24 text

Fast Lossy Ring Buffer: sync.Pool

Slide 25

Slide 25 text

Ring Buffer Benchmarks BenchmarkSyncPool-32 30000000 70.4 ns/op 14.20 M/s BenchmarkMutexLock-32 10000000 210 ns/op 4.75 M/s BenchmarkChan-32 2000000 853 ns/op 1.17 M/s Almost 100% pick ratio, i.e. almost zero drops.

Slide 26

Slide 26 text

Handling Gets Buffer reaches capacity Pick all keys, push to channel Drop if channel is full Channel updates counters

Slide 27

Slide 27 text

Handling Sets Sets put into lossy Channel. Channel gets passed to admission policy. Can be lost or rejected.

Slide 28

Slide 28 text

Contention Proof Drop metadata updates under heavy loads. Only slight effect on hit ratios.

Slide 29

Slide 29 text

Cost of Keys Naive assumption Each key-value = Cost 1 Distinct cost to each key-value Total Capacity based on Cost One KV add ⇒ Many KVs removed

Slide 30

Slide 30 text

Mechanism: Counters & Policy

Slide 31

Slide 31 text

LFU based Eviction & Admission LRU admits every key By using an admission policy Ristretto achieves high Hit ratios

Slide 32

Slide 32 text

TinyLFU LFU depends upon (key -> frequency of access) map. Tiny LFU = Freq counter, 4 bits per key. Counter /= 2, Every N updates to maintain recency. Increment(key uint64) Estimate(key uint64) int Reset()

Slide 33

Slide 33 text

Door Keeper Bloom Filter If Key is not present, add it and stop. If already present, push to TinyLFU. Stop keys which won't typically occur more than once.

Slide 34

Slide 34 text

Architecture

Slide 35

Slide 35 text

Eviction Ideally: Evict key with global min Estimate Practically: Pick random sample of keys via Map iteration Pick key with min Estimate from sample

Slide 36

Slide 36 text

Admission Under capacity, admit everything. Reached capacity: Admit if can evict key with lower estimate Otherwise, reject

Slide 37

Slide 37 text

Metrics Atomic counters for: Hits Misses Drops Rejects etc.

Slide 38

Slide 38 text

False Sharing

Slide 39

Slide 39 text

With Padding

Slide 40

Slide 40 text

Atomic Counter Benchmarks BenchmarkShared-32 30000000 49.4 ns/op BenchmarkPadded-32 50000000 26.9 ns/op

Slide 41

Slide 41 text

Benchmarks and Code https://github.com/dgraph-io/ristretto

Slide 42

Slide 42 text

Dgraph Labs is Hiring https://dgraph.io/careers

Slide 43

Slide 43 text

Happy 10th Anniversary Go! Manish R Jain ([email protected]) Karl McGuire ([email protected]) https://github.com/dgraph-io/ristretto Special Thanks To: Ben Manes, Damian Gryski