Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ristretto: a high performance, concurrent, memo...

Dgraph Labs
September 07, 2019

Ristretto: a high performance, concurrent, memory-bound Go cache

With over six months of research and development, we’re proud to announce the initial release of Ristretto: A High Performance, Concurrent, Memory-Bound Go cache.

It is contention-proof, scales well and provides consistently high hit-ratios. Preface It all started with needing a memory-bound, concurrent Go cache in Dgraph. We looked around for a solution, but we couldn’t find a great one.

This talk covers how we implemented one and the lessons we learned from it.

Dgraph Labs

September 07, 2019
Tweet

More Decks by Dgraph Labs

Other Decks in Technology

Transcript

  1. Ristretto: A high performance, Ristretto: A high performance, concurrent, memory-bound

    Go cache concurrent, memory-bound Go cache Manish R Jain, Dgraph Labs Manish R Jain, Dgraph Labs Sep 7, 2019 Sep 7, 2019 Go Meetup Bangalore Go Meetup Bangalore
  2. State of Caching in Go State of Caching in Go

    State of Caching in Go State of Caching in Go (https://blog.dgraph.io/post/caching-in-go/) (https://blog.dgraph.io/post/caching-in-go/) 2 2
  3. Problem Problem Go groupcache's LRU cache was causing contention, slowing

    Dgraph down. Go groupcache's LRU cache was causing contention, slowing Dgraph down. Improved query latency by 10x by removing the cache! Improved query latency by 10x by removing the cache! Commit for Removing Cache Commit for Removing Cache (https://github.com/dgraph-io/dgraph/commit/b9990f4619b64615c2c18bb7627d198b4397109c) (https://github.com/dgraph-io/dgraph/commit/b9990f4619b64615c2c18bb7627d198b4397109c) 3 3
  4. Requirements Requirements Concurrent Concurrent Memory-bounded (limit to configurable max memory

    usage) Memory-bounded (limit to configurable max memory usage) Scale well as the number of cores and goroutines increase Scale well as the number of cores and goroutines increase Scale well under non-random key access distribution (e.g. Zipf) Scale well under non-random key access distribution (e.g. Zipf) High cache hit ratio High cache hit ratio 5 5
  5. Caffeine Hit Caffeine Hit Caffeine is a high performance, near

    optimal caching library based on Java 8. Caffeine is a high performance, near optimal caching library based on Java 8. Library being used by many DBs in Java. Library being used by many DBs in Java. Cassandra, HBase, Neo4j Cassandra, HBase, Neo4j Author, Ben Manes, wrote multiple papers about the cache. Author, Ben Manes, wrote multiple papers about the cache. github.com/ben-manes/caffeine github.com/ben-manes/caffeine (https://github.com/ben-manes/caffeine) (https://github.com/ben-manes/caffeine) 6 6
  6. Go Ristretto Go Ristretto Ristretto is traditionally a short shot

    of espresso coffee made with the normal amount of Ristretto is traditionally a short shot of espresso coffee made with the normal amount of ground coffee but extracted with about half the amount of water in the same amount of ground coffee but extracted with about half the amount of water in the same amount of time by using a finer grind. time by using a finer grind. 7 7
  7. Go Ristretto Go Ristretto Ristretto is a high performance, concurrent,

    memory-bound Go cache. Ristretto is a high performance, concurrent, memory-bound Go cache. High hit ratio. High hit ratio. High read-write throughput. High read-write throughput. Scalable well as cores increase. Scalable well as cores increase. Contention Proof. Contention Proof. 8 8
  8. Code Code func main() { func main() { // create

    a cache instance // create a cache instance cache, err := ristretto.NewCache(&ristretto.Config{ cache, err := ristretto.NewCache(&ristretto.Config{ NumCounters: 1000000 * 10, NumCounters: 1000000 * 10, MaxCost: 1000000, MaxCost: 1000000, BufferItems: 64, BufferItems: 64, }) }) if err != nil { if err != nil { panic(err) panic(err) } } cache.Set("key", "value", 1) // set a value cache.Set("key", "value", 1) // set a value // wait for value to pass through buffers // wait for value to pass through buffers time.Sleep(time.Second / 100) time.Sleep(time.Second / 100) value, found := cache.Get("key") value, found := cache.Get("key") if !found { if !found { panic("missing value") panic("missing value") } } fmt.Println(value) fmt.Println(value) cache.Del("key") cache.Del("key") } } 9 9
  9. Mechanism 1: Hashing Mechanism 1: Hashing Instead of storing long

    keys, Ristretto takes a uint64 hash. Instead of storing long keys, Ristretto takes a uint64 hash. Uses runtime.memhash, which uses ASM code for performance. Uses runtime.memhash, which uses ASM code for performance. 11 11
  10. Mechanism 2: Cache Mechanism 2: Cache Tried various approaches, including

    locked map and sync.Map. Tried various approaches, including locked map and sync.Map. Fastest accesses with a locked map with 256 shards. Fastest accesses with a locked map with 256 shards. 12 12
  11. Mechanism 3: BP-Wrapper Mechanism 3: BP-Wrapper Batch up updates to

    the cache. Batch up updates to the cache. Wait for ring buffer to fill up, then "drain" into critical section. Wait for ring buffer to fill up, then "drain" into critical section. Decrease the number of times a lock is acquired or released. Decrease the number of times a lock is acquired or released. 14 14
  12. Mechanism 4: Gets Mechanism 4: Gets All Gets to the

    cache are put into a lossy buffer. All Gets to the cache are put into a lossy buffer. Various mechanisms were explored including Channels. Ultimately used sync.Pool. Various mechanisms were explored including Channels. Ultimately used sync.Pool. sync.Pool performs better because of thread-local access (not exposed to mortal beings). sync.Pool performs better because of thread-local access (not exposed to mortal beings). sync.Pool would automatically remove some ringStripes (during Go GC): lossy behavior sync.Pool would automatically remove some ringStripes (during Go GC): lossy behavior #1. #1. AddToLossyBuffer(key): AddToLossyBuffer(key): stripe := b.pool.Get().(*ringStripe) stripe := b.pool.Get().(*ringStripe) stripe.Push(key) stripe.Push(key) b.pool.Put(stripe) b.pool.Put(stripe) 15 15
  13. Mechanism 4: Gets Mechanism 4: Gets Gets are processed with

    a delay: only after stripe reaches capacity. Gets are processed with a delay: only after stripe reaches capacity. Once it does, all keys are pushed to admission policy. Once it does, all keys are pushed to admission policy. If too many pending, push would be dropped: lossy behavior #2. If too many pending, push would be dropped: lossy behavior #2. select { select { case p.itemsCh <- keys: case p.itemsCh <- keys: p.stats.Add(keepGets, keys[0], uint64(len(keys))) p.stats.Add(keepGets, keys[0], uint64(len(keys))) return true return true default: default: p.stats.Add(dropGets, keys[0], uint64(len(keys))) p.stats.Add(dropGets, keys[0], uint64(len(keys))) return false return false } } itemCh processing: itemCh processing: func (p *tinyLFU) Push(keys []uint64) { func (p *tinyLFU) Push(keys []uint64) { for _, key := range keys { for _, key := range keys { p.Increment(key) p.Increment(key) } } } } 16 16
  14. Mechanism 5: Sets Mechanism 5: Sets All Sets to the

    cache are put into another lossy buffer (via Go channel). All Sets to the cache are put into another lossy buffer (via Go channel). Sets should be processed asap, so using channels for batching. Sets should be processed asap, so using channels for batching. select { select { case c.setBuf <- &item{key: hash, val: val, cost: cost}: case c.setBuf <- &item{key: hash, val: val, cost: cost}: return true return true default: default: // drop the set and avoid blocking // drop the set and avoid blocking c.stats.Add(dropSets, hash, 1) c.stats.Add(dropSets, hash, 1) return false return false } } 17 17
  15. Contention Proof Contention Proof Both Gets and Sets would drop

    items if too many are pending. Both Gets and Sets would drop items if too many are pending. This performs really well under contention. This performs really well under contention. The hit ratio is affected, but still performs better than others. The hit ratio is affected, but still performs better than others. However, it does mean that Sets can be lost (or rejected). However, it does mean that Sets can be lost (or rejected). Separate mechanism to ensure a value update is always captured. Separate mechanism to ensure a value update is always captured. 18 18
  16. Mechanism 6: Cost of keys Mechanism 6: Cost of keys

    Assuming each key-value costs 1 is naive. Assuming each key-value costs 1 is naive. Each key-value adds a distinct cost to the cache. Each key-value adds a distinct cost to the cache. Ristretto captures it and maintains a key -> cost map. Ristretto captures it and maintains a key -> cost map. Total capacity of the cache is based on this cost. Total capacity of the cache is based on this cost. Can be set to number of bytes for key + value. Or, set to 1 for traditional approach. Can be set to number of bytes for key + value. Or, set to 1 for traditional approach. func (c *Cache) Set(key interface{}, val interface{}, cost int64) func (c *Cache) Set(key interface{}, val interface{}, cost int64) 19 19
  17. Mechanism 7: DoorKeeper Mechanism 7: DoorKeeper Bloom Filter. Bloom Filter.

    Stop keys which won't typically occur more than once. Stop keys which won't typically occur more than once. If Get key is not present, add it, but don't go any further. If Get key is not present, add it, but don't go any further. If already present, proceed. If already present, proceed. 21 21
  18. Mechanism 8: Tiny LFU Mechanism 8: Tiny LFU LFU depends

    upon (key -> frequency of access) map. LFU depends upon (key -> frequency of access) map. Tiny LFU provides a freq counter which only takes 4 bits per key. Tiny LFU provides a freq counter which only takes 4 bits per key. Halve the counters every N updates to maintain recency. Halve the counters every N updates to maintain recency. Increment(key uint64) Increment(key uint64) Estimate(key uint64) int Estimate(key uint64) int reset() reset() 22 22
  19. Mechanism 8: Tiny LFU Mechanism 8: Tiny LFU func (p

    *tinyLFU) Increment(key uint64) { func (p *tinyLFU) Increment(key uint64) { // flip doorkeeper bit if not already // flip doorkeeper bit if not already if added := p.door.AddIfNotHas(key); !added { if added := p.door.AddIfNotHas(key); !added { // increment count-min counter if doorkeeper bit is already set. // increment count-min counter if doorkeeper bit is already set. p.freq.Increment(key) p.freq.Increment(key) } } p.incrs++ p.incrs++ if p.incrs >= p.resetAt { if p.incrs >= p.resetAt { p.reset() p.reset() } } } } 23 23
  20. Mechanism 9: Eviction Mechanism 9: Eviction All incoming keys (via

    Get or Set) update the estimate. All incoming keys (via Get or Set) update the estimate. For any incoming key, we want to evict a key with minimum estimate (expensive). For any incoming key, we want to evict a key with minimum estimate (expensive). Use Go maps randomness to generate a sample of keys and find min. Use Go maps randomness to generate a sample of keys and find min. Estimate (Evicted key) < Estimate (Incoming key). Otherwise, reject incoming. Estimate (Evicted key) < Estimate (Incoming key). Otherwise, reject incoming. 24 24
  21. Mechanism 9: Eviction Mechanism 9: Eviction incHits := p.admit.Estimate(key) incHits

    := p.admit.Estimate(key) for ; room < 0; room = p.evict.roomLeft(cost) { for ; room < 0; room = p.evict.roomLeft(cost) { sample = p.evict.fillSample(sample) sample = p.evict.fillSample(sample) minKey, minHits, minId := uint64(0), int64(math.MaxInt64), 0 minKey, minHits, minId := uint64(0), int64(math.MaxInt64), 0 for i, pair := range sample { for i, pair := range sample { if hits := p.admit.Estimate(pair.key); hits < minHits { if hits := p.admit.Estimate(pair.key); hits < minHits { minKey, minHits, minId = pair.key, hits, i minKey, minHits, minId = pair.key, hits, i } } } } if incHits < minHits { if incHits < minHits { p.stats.Add(rejectSets, key, 1) p.stats.Add(rejectSets, key, 1) return victims, false return victims, false } } p.evict.del(minKey) p.evict.del(minKey) sample[minId] = sample[len(sample)-1] sample[minId] = sample[len(sample)-1] sample = sample[:len(sample)-1] sample = sample[:len(sample)-1] victims = append(victims, minKey) victims = append(victims, minKey) } } 25 25
  22. Throughput Throughput All throughput benchmarks were ran using Cache Bench.

    All throughput benchmarks were ran using Cache Bench. Cache Bench Cache Bench (https://github.com/dgraph-io/benchmarks/blob/c6ba24086610ee7c1f519f70dc4e862ee2c4b6a8/cachebench/cache_bench_test.go) (https://github.com/dgraph-io/benchmarks/blob/c6ba24086610ee7c1f519f70dc4e862ee2c4b6a8/cachebench/cache_bench_test.go) Intel Core i7-8700K Intel Core i7-8700K 3.7GHz 3.7GHz 6 cores, 12 threads 6 cores, 12 threads 16GB RAM 16GB RAM 26 26
  23. Hit Ratios Hit Ratios Measured using Damian Gryski's cachetest along

    with our own benchmarking suite. Measured using Damian Gryski's cachetest along with our own benchmarking suite. CacheTest CacheTest (https://github.com/dgryski/trifles/blob/master/cachetest/main.go) (https://github.com/dgryski/trifles/blob/master/cachetest/main.go) Ristretto Bench Ristretto Bench (https://github.com/dgraph-io/ristretto/tree/master/bench) (https://github.com/dgraph-io/ristretto/tree/master/bench) 27 27
  24. Optimum Hit Ratio Optimum Hit Ratio Built a theoretically optimum

    hit ratio, called Clairevoyant. Built a theoretically optimum hit ratio, called Clairevoyant. Requires future prediction, so impossible to build. Requires future prediction, so impossible to build. Helps us figure out how close is Ristretto to it. Helps us figure out how close is Ristretto to it. Theoretically Optimum Page Replacement Algo Theoretically Optimum Page Replacement Algo (https://en.wikipedia.org/wiki/Page_replacement_algorithm#The_theoretically_optimal_page_replacement_algorithm) (https://en.wikipedia.org/wiki/Page_replacement_algorithm#The_theoretically_optimal_page_replacement_algorithm) 28 28
  25. Thank you Thank you Manish R Jain, Dgraph Labs Manish

    R Jain, Dgraph Labs Sep 7, 2019 Sep 7, 2019 Go Meetup Bangalore Go Meetup Bangalore [email protected] [email protected] (mailto:[email protected]) (mailto:[email protected]) https://github.com/dgraph-io/ristretto https://github.com/dgraph-io/ristretto (https://github.com/dgraph-io/ristretto) (https://github.com/dgraph-io/ristretto)