Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Groupcache with Evan Owen

Groupcache with Evan Owen

Video here:

Hakka Labs

April 06, 2015
Tweet

More Decks by Hakka Labs

Other Decks in Programming

Transcript

  1. Content Processing and Delivery with Groupcache or, how we made

    media messaging fast. Evan Owen Director of Engineering @kainosnoema, cotap.me/evan
  2. • WhatsApp for Work • Company directory without swapping numbers

    • Conversations synced across devices • Simple, powerful native apps • Integrated with Salesforce • Send files, photos, video Cotap: mobile messaging for business
  3. Goals for media messaging • Sending should feel as fast

    as transmitting the data • Latency between sending and receiving message as low as sending text • Delivery time should remain constant as number of recipients grows • One-to-many delivery with dozens of resolutions (phone, tablet, web, desktop)
  4. Image processing patterns • Synchronous on upload - easy! •

    Asynchronous on upload - harder • On-demand - really hard at scale
  5. groupcache • github.com/golang/groupcache • by Brad Fitzpatrick, Google (memcached, OpenID,

    etc) • Originally written for dl.google.com (GoSF, Sept. 2013) • See: talks.golang.org/2013/oscon-dl.slide
  6. groupcache, cont. • memcached alternative (read-through only) • Go library,

    acts as server and client to peers • Distributed, coordinated cache fills (no thundering herds) • Replication of hot keys across cluster
  7. Distributed read-through caching Node 2 Node 1 Node 3 Node

    4 Node 5 group.Get(“key-1234”) Consistent hash ring determines “primary” node.
  8. In-memory, immutable cache • Main and hot caches are LRU

    of configurable size • Data is cached and transmitted using Protocol Buffers • No `set` or `expire`: once filled, data is immutable • If remote node can’t be reached, key is filled locally
  9. Declare a group w/ getter function var getterFunc = func(ctx

    gc.Context, key string, dest gc.Sink) error { fileName, size := parseKey(key) dest.SetBytes(generateResized(fileName, size)) return nil } var group = gc.NewGroup("resized", 1024<<20, gc.GetterFunc(getterFunc))
  10. Request keys var data []byte err := group.Get(nil, “original-sm.jpg”, gc.AllocatingByteSliceSink(&data))

    // ... http.ServeContent(w, r, “original-sm.jpg”, modTime, bytes.NewReader(data)) • local memory cache • peer's memory cache • local computed value • peer’s computed value Filled from one of:
  11. How we use groupcache for media • Content-addressable originals in

    “cold storage” • Small groupcache group in front of originals • Images resized in-process using OpenCV (w/ cgo) • Larger groupcache in front of resized images • Finally: CDN in front of outermost groupcache
  12. Fast image messaging Originals CDN Sending: as fast as uploading,

    pre-warm asynchronously … … receiving: network latency (best-case) or a single processing operation
  13. Cache item structure type Artifact struct { PbValue []byte `protobuf:"bytes,1,opt,name=value"`

    PbContentType *string `protobuf:"bytes,2,opt,name=content_type"` PbLastModified *string `protobuf:"bytes,3,opt,name=last_modified"` XXX_unrecognized []byte } func (a *Artifact) HTTPHeader() (h http.Header) { h = new(http.Header) h.Set("Content-Length", strconv.Itoa(len(a.PbValue)) h.Set("Content-Type", *a.PbContentType) h.Set("Etag", "\""+a.Md5String()+"\"") return }
  14. Serve using http.ServeContent() a := cachegroup.Get(req.URL.Path) for k, v :=

    range a.HTTPHeader() { res.Header().Set(k, v[0]) } contentSeeker := io.ReadSeeker(bytes.NewReader(a.Value())) http.ServeContent(res, req, req.URL.Path, a.ModTime(), contentSeeker)
  15. Stats and metrics { "cache": { "bytes": 118650246, "evictions": 2772,

    "gets": 12211, "hits": 4395, "items": 528, "misses": 7816 }, "loads": 7499, "loads_deduped": 7496, "local_load_errors": 108, "local_loads": 3300, "peer_errors": 46, "peer_gets": 4875, "peer_loads": 4088 } var stats = group.Stats var cacheStats = group.CacheStats(gc.MainCache) • Exposed on localhost at /stats • Each node monitored with Sensu • Metrics sent to Graphite
  16. groupcache “gotchas” • Inconsistent hash rings: always ensure local node

    is included in peer list, update peer lists as quickly as possible • Getter function inconsistency • Cache memory usage: leave enough headroom • Deadlocks: strict timeouts on cache getter functions • Cold cluster: use rolling restarts when upgrading