Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Ristretto: A High Performance, Concurr...

Dgraph Labs
November 14, 2019

Building Ristretto: A High Performance, Concurrent, Memory Bound Go cache

Manish R Jain, founder and CEO at Dgraph Labs, goes into all the details of how Ristretto was built.

Ristretto is a modern cache written natively in Go and outperforming all previous Go caches available. Manish explains not only the final architecture of the system but also the design decisions that took there.

You can learn more about ristretto:
- https://github.com/dgraph-io/ristretto
- https://blog.dgraph.io/post/introducing-ristretto-high-perf-go-cache/

Dgraph Labs

November 14, 2019
Tweet

More Decks by Dgraph Labs

Other Decks in Programming

Transcript

  1. Caffeine Caffeine is a high performance, near optimal caching library

    in Java Used by many DBs in Java Papers by author, Ben Manes https://github.com/ben-manes/caffeine
  2. The Cache Conundrum Needed a fast Go cache in Dgraph

    Using Go Groupcache LRU cache Improved query latency 10x by removing the cache!
  3. Smart Caches Maintain global metadata for all Get and Set

    ops Even Gets write to the metadata Locks must be acquired, causes contention
  4. Build a cache which degrades, but never causes contention Concurrent

    Memory-Bound Scale to Cores Scale to non-random Key Access High Cache Hit Ratio Requirements
  5. Hit Ratio Hit = Served out of cache Miss =

    Cache fails to serve request Hit Ratio = Hits / Total Requests
  6. Ristretto: My favorite coffee Normal amount of ground coffee +

    Half the water + Finer grind = Ristretto
  7. Go Ristretto Ristretto is a high performance, concurrent, memory-bound Go

    cache High hit ratio High read-write throughput Scalable well as cores increase Contention Proof
  8. Show the code func main() { // create a cache

    instance cache, err := ristretto.NewCache(&ristretto.Config{ NumCounters: 10 << 20, // 10M MaxCost: 1 << 30, // 1GB BufferItems: 64, }) if err != nil { log.Fatal(err) } cache.Set("key", "value", 5) // set a value time.Sleep(time.MilliSecond) // wait a bit value, found := cache.Get("key") if !found { panic("missing value") } fmt.Println(value) cache.Del("key") }
  9. Storage: Map performance Gets (Zipfian) syncMap 15.5 ns/op 64.62 M/s

    lockSharded 26.1 ns/op 38.38 M/s lock 42.9 ns/op 23.32 M/s Sets (Zipfian) lockSharded 44.7 ns/op 22.38 M/s lock 79.6 ns/op 12.56 M/s syncMap 218 ns/op 4.58 M/s
  10. Storage: Hashing Store hashes instead of keys Use hashes for

    load distribution BenchmarkFarm-32 100000000 17.5 ns/op BenchmarkSip-32 30000000 40.7 ns/op BenchmarkFnv-32 20000000 69.1 ns/op
  11. Go Runtime Memhash //go:noescape //go:linkname memhash runtime.memhash func memhash(p unsafe.Pointer,

    h, s uintptr) uintptr func MemHash(data []byte) uint64 { ss := (*stringStruct)(unsafe.Pointer(&data)) return uint64(memhash(ss.str, 0, uintptr(ss.len))) }
  12. Storage: Hashing with Memhash BenchmarkMemHash-32 200000000 9.69 ns/op BenchmarkFarm-32 100000000

    17.5 ns/op BenchmarkSip-32 30000000 40.7 ns/op BenchmarkFnv-32 20000000 69.1 ns/op
  13. Collisions Prevent collisions using two uint64 hashes internally With 128-bit

    hash: 820 Billion keys in cache ~ 10-15 Probability Fun Fact 10-15 = Uncorrectable bit error rate for HDD
  14. Ring Buffer Benchmarks BenchmarkSyncPool-32 30000000 70.4 ns/op 14.20 M/s BenchmarkMutexLock-32

    10000000 210 ns/op 4.75 M/s BenchmarkChan-32 2000000 853 ns/op 1.17 M/s Almost 100% pick ratio, i.e. almost zero drops.
  15. Handling Gets Buffer reaches capacity Pick all keys, push to

    channel Drop if channel is full Channel updates counters
  16. Handling Sets Sets put into lossy Channel. Channel gets passed

    to admission policy. Can be lost or rejected.
  17. Cost of Keys Naive assumption Each key-value = Cost 1

    Distinct cost to each key-value Total Capacity based on Cost One KV add ⇒ Many KVs removed
  18. LFU based Eviction & Admission LRU admits every key By

    using an admission policy Ristretto achieves high Hit ratios
  19. TinyLFU LFU depends upon (key -> frequency of access) map.

    Tiny LFU = Freq counter, 4 bits per key. Counter /= 2, Every N updates to maintain recency. Increment(key uint64) Estimate(key uint64) int Reset()
  20. Door Keeper Bloom Filter If Key is not present, add

    it and stop. If already present, push to TinyLFU. Stop keys which won't typically occur more than once.
  21. Eviction Ideally: Evict key with global min Estimate Practically: Pick

    random sample of keys via Map iteration Pick key with min Estimate from sample
  22. Admission Under capacity, admit everything. Reached capacity: Admit if can

    evict key with lower estimate Otherwise, reject
  23. Happy 10th Anniversary Go! Manish R Jain ([email protected]) Karl McGuire

    ([email protected]) https://github.com/dgraph-io/ristretto Special Thanks To: Ben Manes, Damian Gryski