Ristretto: a high performance, concurrent, memory-bound Go cache

Slide 1

Slide 1 text

Ristretto: A high performance, Ristretto: A high performance, concurrent, memory-bound Go cache concurrent, memory-bound Go cache Manish R Jain, Dgraph Labs Manish R Jain, Dgraph Labs Sep 7, 2019 Sep 7, 2019 Go Meetup Bangalore Go Meetup Bangalore

Slide 2

Slide 2 text

State of Caching in Go State of Caching in Go State of Caching in Go State of Caching in Go (https://blog.dgraph.io/post/caching-in-go/) (https://blog.dgraph.io/post/caching-in-go/) 2 2

Slide 3

Slide 3 text

Problem Problem Go groupcache's LRU cache was causing contention, slowing Dgraph down. Go groupcache's LRU cache was causing contention, slowing Dgraph down. Improved query latency by 10x by removing the cache! Improved query latency by 10x by removing the cache! Commit for Removing Cache Commit for Removing Cache (https://github.com/dgraph-io/dgraph/commit/b9990f4619b64615c2c18bb7627d198b4397109c) (https://github.com/dgraph-io/dgraph/commit/b9990f4619b64615c2c18bb7627d198b4397109c) 3 3

Slide 4

Slide 4 text

Your cache is slowing you down. Your cache is slowing you down. 4 4

Slide 5

Slide 5 text

Requirements Requirements Concurrent Concurrent Memory-bounded (limit to configurable max memory usage) Memory-bounded (limit to configurable max memory usage) Scale well as the number of cores and goroutines increase Scale well as the number of cores and goroutines increase Scale well under non-random key access distribution (e.g. Zipf) Scale well under non-random key access distribution (e.g. Zipf) High cache hit ratio High cache hit ratio 5 5

Slide 6

Slide 6 text

Caffeine Hit Caffeine Hit Caffeine is a high performance, near optimal caching library based on Java 8. Caffeine is a high performance, near optimal caching library based on Java 8. Library being used by many DBs in Java. Library being used by many DBs in Java. Cassandra, HBase, Neo4j Cassandra, HBase, Neo4j Author, Ben Manes, wrote multiple papers about the cache. Author, Ben Manes, wrote multiple papers about the cache. github.com/ben-manes/caffeine github.com/ben-manes/caffeine (https://github.com/ben-manes/caffeine) (https://github.com/ben-manes/caffeine) 6 6

Slide 7

Slide 7 text

Go Ristretto Go Ristretto Ristretto is traditionally a short shot of espresso coffee made with the normal amount of Ristretto is traditionally a short shot of espresso coffee made with the normal amount of ground coffee but extracted with about half the amount of water in the same amount of ground coffee but extracted with about half the amount of water in the same amount of time by using a finer grind. time by using a finer grind. 7 7

Slide 8

Slide 8 text

Go Ristretto Go Ristretto Ristretto is a high performance, concurrent, memory-bound Go cache. Ristretto is a high performance, concurrent, memory-bound Go cache. High hit ratio. High hit ratio. High read-write throughput. High read-write throughput. Scalable well as cores increase. Scalable well as cores increase. Contention Proof. Contention Proof. 8 8

Slide 9

Slide 9 text

Code Code func main() { func main() { // create a cache instance // create a cache instance cache, err := ristretto.NewCache(&ristretto.Config{ cache, err := ristretto.NewCache(&ristretto.Config{ NumCounters: 1000000 * 10, NumCounters: 1000000 * 10, MaxCost: 1000000, MaxCost: 1000000, BufferItems: 64, BufferItems: 64, }) }) if err != nil { if err != nil { panic(err) panic(err) } } cache.Set("key", "value", 1) // set a value cache.Set("key", "value", 1) // set a value // wait for value to pass through buffers // wait for value to pass through buffers time.Sleep(time.Second / 100) time.Sleep(time.Second / 100) value, found := cache.Get("key") value, found := cache.Get("key") if !found { if !found { panic("missing value") panic("missing value") } } fmt.Println(value) fmt.Println(value) cache.Del("key") cache.Del("key") } } 9 9

Slide 10

Slide 10 text

Storage Storage 10 10

Slide 11

Slide 11 text

Mechanism 1: Hashing Mechanism 1: Hashing Instead of storing long keys, Ristretto takes a uint64 hash. Instead of storing long keys, Ristretto takes a uint64 hash. Uses runtime.memhash, which uses ASM code for performance. Uses runtime.memhash, which uses ASM code for performance. 11 11

Slide 12

Slide 12 text

Mechanism 2: Cache Mechanism 2: Cache Tried various approaches, including locked map and sync.Map. Tried various approaches, including locked map and sync.Map. Fastest accesses with a locked map with 256 shards. Fastest accesses with a locked map with 256 shards. 12 12

Slide 13

Slide 13 text

Concurrency Concurrency 13 13

Slide 14

Slide 14 text

Mechanism 3: BP-Wrapper Mechanism 3: BP-Wrapper Batch up updates to the cache. Batch up updates to the cache. Wait for ring buffer to fill up, then "drain" into critical section. Wait for ring buffer to fill up, then "drain" into critical section. Decrease the number of times a lock is acquired or released. Decrease the number of times a lock is acquired or released. 14 14

Slide 15

Slide 15 text

Mechanism 4: Gets Mechanism 4: Gets All Gets to the cache are put into a lossy buffer. All Gets to the cache are put into a lossy buffer. Various mechanisms were explored including Channels. Ultimately used sync.Pool. Various mechanisms were explored including Channels. Ultimately used sync.Pool. sync.Pool performs better because of thread-local access (not exposed to mortal beings). sync.Pool performs better because of thread-local access (not exposed to mortal beings). sync.Pool would automatically remove some ringStripes (during Go GC): lossy behavior sync.Pool would automatically remove some ringStripes (during Go GC): lossy behavior #1. #1. AddToLossyBuffer(key): AddToLossyBuffer(key): stripe := b.pool.Get().(*ringStripe) stripe := b.pool.Get().(*ringStripe) stripe.Push(key) stripe.Push(key) b.pool.Put(stripe) b.pool.Put(stripe) 15 15

Slide 16

Slide 16 text

Mechanism 4: Gets Mechanism 4: Gets Gets are processed with a delay: only after stripe reaches capacity. Gets are processed with a delay: only after stripe reaches capacity. Once it does, all keys are pushed to admission policy. Once it does, all keys are pushed to admission policy. If too many pending, push would be dropped: lossy behavior #2. If too many pending, push would be dropped: lossy behavior #2. select { select { case p.itemsCh <- keys: case p.itemsCh <- keys: p.stats.Add(keepGets, keys[0], uint64(len(keys))) p.stats.Add(keepGets, keys[0], uint64(len(keys))) return true return true default: default: p.stats.Add(dropGets, keys[0], uint64(len(keys))) p.stats.Add(dropGets, keys[0], uint64(len(keys))) return false return false } } itemCh processing: itemCh processing: func (p *tinyLFU) Push(keys []uint64) { func (p *tinyLFU) Push(keys []uint64) { for _, key := range keys { for _, key := range keys { p.Increment(key) p.Increment(key) } } } } 16 16

Slide 17

Slide 17 text

Mechanism 5: Sets Mechanism 5: Sets All Sets to the cache are put into another lossy buffer (via Go channel). All Sets to the cache are put into another lossy buffer (via Go channel). Sets should be processed asap, so using channels for batching. Sets should be processed asap, so using channels for batching. select { select { case c.setBuf <- &item{key: hash, val: val, cost: cost}: case c.setBuf <- &item{key: hash, val: val, cost: cost}: return true return true default: default: // drop the set and avoid blocking // drop the set and avoid blocking c.stats.Add(dropSets, hash, 1) c.stats.Add(dropSets, hash, 1) return false return false } } 17 17

Slide 18

Slide 18 text

Contention Proof Contention Proof Both Gets and Sets would drop items if too many are pending. Both Gets and Sets would drop items if too many are pending. This performs really well under contention. This performs really well under contention. The hit ratio is affected, but still performs better than others. The hit ratio is affected, but still performs better than others. However, it does mean that Sets can be lost (or rejected). However, it does mean that Sets can be lost (or rejected). Separate mechanism to ensure a value update is always captured. Separate mechanism to ensure a value update is always captured. 18 18

Slide 19

Slide 19 text

Mechanism 6: Cost of keys Mechanism 6: Cost of keys Assuming each key-value costs 1 is naive. Assuming each key-value costs 1 is naive. Each key-value adds a distinct cost to the cache. Each key-value adds a distinct cost to the cache. Ristretto captures it and maintains a key -> cost map. Ristretto captures it and maintains a key -> cost map. Total capacity of the cache is based on this cost. Total capacity of the cache is based on this cost. Can be set to number of bytes for key + value. Or, set to 1 for traditional approach. Can be set to number of bytes for key + value. Or, set to 1 for traditional approach. func (c *Cache) Set(key interface{}, val interface{}, cost int64) func (c *Cache) Set(key interface{}, val interface{}, cost int64) 19 19

Slide 20

Slide 20 text

Policy Policy 20 20

Slide 21

Slide 21 text

Mechanism 7: DoorKeeper Mechanism 7: DoorKeeper Bloom Filter. Bloom Filter. Stop keys which won't typically occur more than once. Stop keys which won't typically occur more than once. If Get key is not present, add it, but don't go any further. If Get key is not present, add it, but don't go any further. If already present, proceed. If already present, proceed. 21 21

Slide 22

Slide 22 text

Mechanism 8: Tiny LFU Mechanism 8: Tiny LFU LFU depends upon (key -> frequency of access) map. LFU depends upon (key -> frequency of access) map. Tiny LFU provides a freq counter which only takes 4 bits per key. Tiny LFU provides a freq counter which only takes 4 bits per key. Halve the counters every N updates to maintain recency. Halve the counters every N updates to maintain recency. Increment(key uint64) Increment(key uint64) Estimate(key uint64) int Estimate(key uint64) int reset() reset() 22 22

Slide 23

Slide 23 text

Mechanism 8: Tiny LFU Mechanism 8: Tiny LFU func (p *tinyLFU) Increment(key uint64) { func (p *tinyLFU) Increment(key uint64) { // flip doorkeeper bit if not already // flip doorkeeper bit if not already if added := p.door.AddIfNotHas(key); !added { if added := p.door.AddIfNotHas(key); !added { // increment count-min counter if doorkeeper bit is already set. // increment count-min counter if doorkeeper bit is already set. p.freq.Increment(key) p.freq.Increment(key) } } p.incrs++ p.incrs++ if p.incrs >= p.resetAt { if p.incrs >= p.resetAt { p.reset() p.reset() } } } } 23 23

Slide 24

Slide 24 text

Mechanism 9: Eviction Mechanism 9: Eviction All incoming keys (via Get or Set) update the estimate. All incoming keys (via Get or Set) update the estimate. For any incoming key, we want to evict a key with minimum estimate (expensive). For any incoming key, we want to evict a key with minimum estimate (expensive). Use Go maps randomness to generate a sample of keys and find min. Use Go maps randomness to generate a sample of keys and find min. Estimate (Evicted key) < Estimate (Incoming key). Otherwise, reject incoming. Estimate (Evicted key) < Estimate (Incoming key). Otherwise, reject incoming. 24 24

Slide 25

Slide 25 text

Mechanism 9: Eviction Mechanism 9: Eviction incHits := p.admit.Estimate(key) incHits := p.admit.Estimate(key) for ; room < 0; room = p.evict.roomLeft(cost) { for ; room < 0; room = p.evict.roomLeft(cost) { sample = p.evict.fillSample(sample) sample = p.evict.fillSample(sample) minKey, minHits, minId := uint64(0), int64(math.MaxInt64), 0 minKey, minHits, minId := uint64(0), int64(math.MaxInt64), 0 for i, pair := range sample { for i, pair := range sample { if hits := p.admit.Estimate(pair.key); hits < minHits { if hits := p.admit.Estimate(pair.key); hits < minHits { minKey, minHits, minId = pair.key, hits, i minKey, minHits, minId = pair.key, hits, i } } } } if incHits < minHits { if incHits < minHits { p.stats.Add(rejectSets, key, 1) p.stats.Add(rejectSets, key, 1) return victims, false return victims, false } } p.evict.del(minKey) p.evict.del(minKey) sample[minId] = sample[len(sample)-1] sample[minId] = sample[len(sample)-1] sample = sample[:len(sample)-1] sample = sample[:len(sample)-1] victims = append(victims, minKey) victims = append(victims, minKey) } } 25 25

Slide 26

Slide 26 text

Throughput Throughput All throughput benchmarks were ran using Cache Bench. All throughput benchmarks were ran using Cache Bench. Cache Bench Cache Bench (https://github.com/dgraph-io/benchmarks/blob/c6ba24086610ee7c1f519f70dc4e862ee2c4b6a8/cachebench/cache_bench_test.go) (https://github.com/dgraph-io/benchmarks/blob/c6ba24086610ee7c1f519f70dc4e862ee2c4b6a8/cachebench/cache_bench_test.go) Intel Core i7-8700K Intel Core i7-8700K 3.7GHz 3.7GHz 6 cores, 12 threads 6 cores, 12 threads 16GB RAM 16GB RAM 26 26

Slide 27

Slide 27 text

Hit Ratios Hit Ratios Measured using Damian Gryski's cachetest along with our own benchmarking suite. Measured using Damian Gryski's cachetest along with our own benchmarking suite. CacheTest CacheTest (https://github.com/dgryski/trifles/blob/master/cachetest/main.go) (https://github.com/dgryski/trifles/blob/master/cachetest/main.go) Ristretto Bench Ristretto Bench (https://github.com/dgraph-io/ristretto/tree/master/bench) (https://github.com/dgraph-io/ristretto/tree/master/bench) 27 27

Slide 28

Slide 28 text

Optimum Hit Ratio Optimum Hit Ratio Built a theoretically optimum hit ratio, called Clairevoyant. Built a theoretically optimum hit ratio, called Clairevoyant. Requires future prediction, so impossible to build. Requires future prediction, so impossible to build. Helps us figure out how close is Ristretto to it. Helps us figure out how close is Ristretto to it. Theoretically Optimum Page Replacement Algo Theoretically Optimum Page Replacement Algo (https://en.wikipedia.org/wiki/Page_replacement_algorithm#The_theoretically_optimal_page_replacement_algorithm) (https://en.wikipedia.org/wiki/Page_replacement_algorithm#The_theoretically_optimal_page_replacement_algorithm) 28 28

Slide 29

Slide 29 text

Results Results github.com/dgraph-io/ristretto github.com/dgraph-io/ristretto (https://github.com/dgraph-io/ristretto) (https://github.com/dgraph-io/ristretto) 29 29

Slide 30

Slide 30 text

Dgraph Labs is HIRING in Bangalore Dgraph Labs is HIRING in Bangalore 30 30

Slide 31

Slide 31 text

Thank you Thank you Manish R Jain, Dgraph Labs Manish R Jain, Dgraph Labs Sep 7, 2019 Sep 7, 2019 Go Meetup Bangalore Go Meetup Bangalore [email protected] [email protected] (mailto:[email protected]) (mailto:[email protected]) https://github.com/dgraph-io/ristretto https://github.com/dgraph-io/ristretto (https://github.com/dgraph-io/ristretto) (https://github.com/dgraph-io/ristretto)

Slide 32

Slide 32 text

No content