Building Ristretto: A High Performance, Concurrent, Memory Bound Go cache

Ristretto A High Performance, Concurrent, Memory Bound Go cache Building

Caffeine Caffeine is a high performance, near optimal caching library
in Java Used by many DBs in Java Papers by author, Ben Manes https://github.com/ben-manes/caffeine

Today’s Talk Why build Ristretto How we got performance out
of Cache Design and Go

The Cache Conundrum Needed a fast Go cache in Dgraph
Using Go Groupcache LRU cache Improved query latency 10x by removing the cache!

Smart Caches Maintain global metadata for all Get and Set
ops Even Gets write to the metadata Locks must be acquired, causes contention

Cache could be slowing down your system

Build a cache which degrades, but never causes contention Concurrent
Memory-Bound Scale to Cores Scale to non-random Key Access High Cache Hit Ratio Requirements

Hit Ratio Hit = Served out of cache Miss =
Cache fails to serve request Hit Ratio = Hits / Total Requests

State of Caching in Go https://blog.dgraph.io/post/caching-in-go/

What is Ristretto

Ristretto: My favorite coffee Normal amount of ground coffee +
Half the water + Finer grind = Ristretto

Go Ristretto Ristretto is a high performance, concurrent, memory-bound Go
cache High hit ratio High read-write throughput Scalable well as cores increase Contention Proof

Show the code func main() { // create a cache
instance cache, err := ristretto.NewCache(&ristretto.Config{ NumCounters: 10 << 20, // 10M MaxCost: 1 << 30, // 1GB BufferItems: 64, }) if err != nil { log.Fatal(err) } cache.Set("key", "value", 5) // set a value time.Sleep(time.MilliSecond) // wait a bit value, found := cache.Get("key") if !found { panic("missing value") } fmt.Println(value) cache.Del("key") }

Mechanisms: Storage Java Concurrent Lockless Map Go Mutex + Map
Sharded + Mutex + Map sync.Map

Storage: Map performance Gets (Zipfian) syncMap 15.5 ns/op 64.62 M/s
lockSharded 26.1 ns/op 38.38 M/s lock 42.9 ns/op 23.32 M/s Sets (Zipfian) lockSharded 44.7 ns/op 22.38 M/s lock 79.6 ns/op 12.56 M/s syncMap 218 ns/op 4.58 M/s

Storage: Hashing Store hashes instead of keys Use hashes for
load distribution BenchmarkFarm-32 100000000 17.5 ns/op BenchmarkSip-32 30000000 40.7 ns/op BenchmarkFnv-32 20000000 69.1 ns/op

Go Runtime Memhash //go:noescape //go:linkname memhash runtime.memhash func memhash(p unsafe.Pointer,
h, s uintptr) uintptr func MemHash(data []byte) uint64 { ss := (*stringStruct)(unsafe.Pointer(&data)) return uint64(memhash(ss.str, 0, uintptr(ss.len))) }

Storage: Hashing with Memhash BenchmarkMemHash-32 200000000 9.69 ns/op BenchmarkFarm-32 100000000
17.5 ns/op BenchmarkSip-32 30000000 40.7 ns/op BenchmarkFnv-32 20000000 69.1 ns/op

Collisions Prevent collisions using two uint64 hashes internally With 128-bit
hash: 820 Billion keys in cache ~ 10-15 Probability Fun Fact 10-15 = Uncorrectable bit error rate for HDD

Mechanism: Concurrency Aim to be Contention Proof

BP-Wrapper for Metadata Writes

How To: Ring Buffer

Fast Lossy Ring Buffer Lossy to avoid contention Drop metadata
updates

Fast Lossy Ring Buffer: sync.Pool

Ring Buffer Benchmarks BenchmarkSyncPool-32 30000000 70.4 ns/op 14.20 M/s BenchmarkMutexLock-32
10000000 210 ns/op 4.75 M/s BenchmarkChan-32 2000000 853 ns/op 1.17 M/s Almost 100% pick ratio, i.e. almost zero drops.

Handling Gets Buffer reaches capacity Pick all keys, push to
channel Drop if channel is full Channel updates counters

Handling Sets Sets put into lossy Channel. Channel gets passed
to admission policy. Can be lost or rejected.

Contention Proof Drop metadata updates under heavy loads. Only slight
effect on hit ratios.

Cost of Keys Naive assumption Each key-value = Cost 1
Distinct cost to each key-value Total Capacity based on Cost One KV add ⇒ Many KVs removed

Mechanism: Counters & Policy

LFU based Eviction & Admission LRU admits every key By
using an admission policy Ristretto achieves high Hit ratios

TinyLFU LFU depends upon (key -> frequency of access) map.
Tiny LFU = Freq counter, 4 bits per key. Counter /= 2, Every N updates to maintain recency. Increment(key uint64) Estimate(key uint64) int Reset()

Door Keeper Bloom Filter If Key is not present, add
it and stop. If already present, push to TinyLFU. Stop keys which won't typically occur more than once.

Architecture

Eviction Ideally: Evict key with global min Estimate Practically: Pick
random sample of keys via Map iteration Pick key with min Estimate from sample

Admission Under capacity, admit everything. Reached capacity: Admit if can
evict key with lower estimate Otherwise, reject

Metrics Atomic counters for: Hits Misses Drops Rejects etc.

False Sharing

With Padding

Atomic Counter Benchmarks BenchmarkShared-32 30000000 49.4 ns/op BenchmarkPadded-32 50000000 26.9
ns/op

Benchmarks and Code https://github.com/dgraph-io/ristretto

Dgraph Labs is Hiring https://dgraph.io/careers

Happy 10th Anniversary Go! Manish R Jain ([email protected]) Karl McGuire
([email protected]) https://github.com/dgraph-io/ristretto Special Thanks To: Ben Manes, Damian Gryski

Building Ristretto: A High Performance, Concurr...

Building Ristretto: A High Performance, Concurrent, Memory Bound Go cache

More Decks by Dgraph Labs

Other Decks in Programming

Featured

Transcript