Cache Rules Everything Around Me

djangocon 2011 9/8/2011 Cache Rules Everything Around Me Noah Silas
Jacob Burch

djangocon 2011 9/8/2011 Hello • Jacob Burch • Engineer at
Revolution Systems. • Former CTO at Mahalo • Noah Silas • Engineer at Causes.com • Former Head Architect at Mahalo

djangocon 2011 9/8/2011 What we’re going to tell you •
Brief (brief!) introduction in caching • Big Picture Considerations of a Caching Architecture • Implementation Best Practices

djangocon 2011 9/8/2011 What we won’t be talking about. •
Backend Tuning • Backend Debates (redis vs. memcache) • Upstream Caches (squid, varnish, etc.) (all that much)

djangocon 2011 9/8/2011 The What/Why/How of Caching • Caching is
storing post-processed data for more- immediate future retrieval • Usually stored in a memory key/value store (memcache, redis) • Used to • Speed up your app • Lessen load on other systems (your db, apis)

djangocon 2011 9/8/2011 Only Rule Of Architecture • There Are
No Rules, Only Principles • Start with assumptions/advice • Benchmark/inspect/meditate on your application’s speciﬁc proﬁle • Break principles as needed

djangocon 2011 9/8/2011 Plan Ahead

djangocon 2011 9/8/2011 Ask Yourself A Bunch Of Questions. And
they don’t need to be answered immediately

djangocon 2011 9/8/2011 • Caching... • Adds complexity, sometimes in
unexpected places • Additional point of failure • Modern Databases are stupidly optimized • May be all you need Do I Really Need Caching?

djangocon 2011 9/8/2011 “There are only two hard things in
Computer Science: cache invalidation and naming things” - Phil Karlton

djangocon 2011 9/8/2011 Magical Cache Rule That Lives in a
Gumdrop House On Lollypop Lane Your Application Should Never Rely On Caching

djangocon 2011 9/8/2011 So, You’re Relying On Your Cache Being
Up • Still...

djangocon 2011 9/8/2011 Super Awesome Fun Time Rule for Minimizing
Sadness • Your Application should have one canonical data source. • This data source IS NOT YOUR CACHE.

djangocon 2011 9/8/2011 Common Cache Patterns

djangocon 2011 9/8/2011 Pattern: A Few Expensive Operations • Rollup
Values (Top Users, Most Commented on Articles, ...) • Anything that does nasty JOINs • External Service Calls

djangocon 2011 9/8/2011 Pattern: A Few Expensive Operations • What’s
Easy: • Cache invalidation is relatively easy • What’s Hard: • Cost of falling out of cache is expensive

djangocon 2011 9/8/2011 Thundering Herd • Many requests try and
get the same empty cache key for a piece of data • All those requests try and calculate the data at the same time • =-( • Databases go down • Third parties rate limit you

djangocon 2011 9/8/2011 The Old and Busted Caching Antipattern •
Determine how long a piece of data can be stale for from a product prospective • Cache the data with a timeout for that length • Have the request recalculate the data and shove back into cache • Pro: Don’t even worry about Invalidation

djangocon 2011 9/8/2011 The New Hotness • Cache Forever*, Invalidate**
explicitly*** • * - as long as your backend will allow • ** - Actively set data, not delete • *** - import this

djangocon 2011 9/8/2011 The New Hotness • Don’t issue DELETEs
to your backend to purge stale data • Instead, calculate the new value and SET it • You should never experience a cache miss! • Expensive calculations can also be done asynchronously (Celery FTW)

djangocon 2011 9/8/2011 Pattern: Publish Cache • What’s Easy: •
Implementation - Drop in Middleware • Cache Invalidation (sometimes) • What’s Hard • Dynamic Page Chunks • Cache Invalidation (sometimes)

djangocon 2011 9/8/2011 Publish Cache Resources • Your Django app
doesn’t necessarily need to be aware of publish caching: Check out Varnish and Squid • You can avoid hitting your application at all by letting a front end server like nginx serve responses directly out of the publish cache

djangocon 2011 9/8/2011 Pattern: Lots of Small Things • What’s
Easy: • Cost of falling out of cache is low • What’s Hard: • Cache Invalidation

djangocon 2011 9/8/2011 What makes Cache Invalidation hard? • How
many keys hold a copy of this object? • How many keys hold a value derived from this object? • Which keys are they? Are you sure you got them all?

djangocon 2011 9/8/2011 How to Avoid Cache Invalidation Hell •
How should I deﬁne my cache keys? • Where should I put my caching related code? • How can I do effective invalidation?

djangocon 2011 9/8/2011 What’s in the Box: django.core.cache • Simple
Setup • Install the python bindings for your cache • and add a few lines to settings.py • Multi Cache Support

djangocon 2011 9/8/2011 django.core.cache.backends • db.DatabaseCache • ﬁlebased.FileBasedCache • locmem.LocMemCache
• dummy.DummyCache • memcached.MemcachedCache • memcached.PyLibMCCache

djangocon 2011 9/8/2011 Publish Cache Provided • View Decorators are
an excellent way to handle a Publish Cache • django.views.decorators.cache.cache_page • To cache all or most of your site, you can use a middleware and explicitly exempt individual views from caching • django.middleware.cache.UpdateCacheMiddleware • django.middleware.cache.FetchFromCacheMiddleware caveat: Hard to invalidate these

djangocon 2011 9/8/2011 {% load cache %} NO Don’t introduce
cache logic into your templating layer

djangocon 2011 9/8/2011 CACHE

djangocon 2011 9/8/2011 django.core.cache keys • settings.CACHES[alias][‘VERSION’] • settings.CACHES[alias][‘KEY_PREFIX’] •
settings.CACHES[alias][‘KEY_FUNCTION’] def make_key(key, key_prefix, version): return ':'.join([key_prefix, str(version), smart_str(key)])

djangocon 2011 9/8/2011 Good Cache Keys • Make Cache Keys
Unique • Use separators that don’t appear in your values • Make Cache Keys Well-Deﬁned

djangocon 2011 9/8/2011 What we mean by “Well Deﬁned” •
Always include all elements in a cache key, even if some are empty or None. Consistency! • Use a deﬁned format string instead of an ad-hoc format string. • DRY! Don’t ever write the same cache key format string twice!

djangocon 2011 9/8/2011 What do we mean by “Well Deﬁned”
>>> user,x,y = ‘zainy’,0,2 >>> key = str(user) >>> if y: ... key += ':' + str(x) ... >>> if z: ... key += ':' + str(y) ... >>> key 'zainy:2' >>> keyfmt = '%(user)s:%(x)s:%(y)s' >>> key = keyfmt % { ... 'user': ‘zainy’, ... 'x': 0, ... 'y': 2} ... >>> key 'zainy:0:2' vs:

djangocon 2011 9/8/2011 KeyFiles make happy devs • Don’t deﬁne
the same key format string in more than one place • Leaving cache key format strings littered around your code is a great way to discover the pains of circular imports • Put all of an app’s cache keys in a dictionary mapping descriptive names to format strings

djangocon 2011 9/8/2011 KeyFiles make happy devs app/keyﬁle.py app/something.py from
app.keyfile import cachekeys key = cachekeys[‘UserThingSet’] % { ‘user_pk’: user.pk } data = cache.get(key) cachekeys = { ‘Thing’: ‘things:thing:%(thing_pk)s’, ‘UserThingSet’: ‘things:for_user:%(user_pk)s’ }

djangocon 2011 9/8/2011 Where should I put Cache related code?
• Avoid putting it in your views • Avoid putting it in model save methods • DRY! Aim to write generic functionality to cover common cases.

• Manager-style objects are a great way to keep cache for model instances generic • MyModel.cache.get(pk=27) • looks good and is explicit! • Con: Hard to generalize cache key generation for non-pk arguments

• To cache collections of objects, add methods to your cache controller • User.cache.get_top10_users() • If you want to be really djangsta, let your cache controller delegate fetching collections of objects to custom model manager methods

djangocon 2011 9/8/2011 Original Djangsta @jeffschenck

djangocon 2011 9/8/2011 Model Invalidation is Hard • Update the
cache for objects through a post_save signal handler • Easy for things keyed by PK (which doesn’t change), but hard for cache keys depending on mutable ﬁelds (like a foreign key). • If you change a foreign key, you may need to add an object to one collection, but remove it from another

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing
1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Things are owned by users via Foreign Key Cache maps user IDs to ThingSets

1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Let’s change Thing2’s owner to User2

1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache .save updates the database

1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache post_save adds thing2 to user2’s cached ThingSet

1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache How can we remove the link in User1’s ThingSet?

djangocon 2011 9/8/2011 Model Invalidation is Hard • We could
refetch the object from DB during a pre_save and compare values • con: extra db work • con: we don’t want to invalidate in pre_save in case the save call fails • thread local storage to communicate between pre and post save handlers

djangocon 2011 9/8/2011 Model Invalidation is Hard • We can
save the original model state after it is loaded, and then reference that state to invalidate old keys

djangocon 2011 9/8/2011 Model Invalidation is Hard def OriginalStateModel(Model): class
Meta: abstract = True def preserve_state(self): # copy the fields into state storage fields = self._meta.fields state = dict( (f.name, getattr(self, f.name)) for f in fields ) self._original_state = state

djangocon 2011 9/8/2011 Model Invalidation is Hard def OriginalStateModel(Model): ...
def __init__(self, *args, **kwargs): super(StateModel, self).__init__(*args, **kwargs) # preserve state after loading self.preserve_state() def save(self, *args, **kwargs): super(OriginalStateModel, self).save(*args, **kwargs) # db save and signal handlers have already happened # preserve state after saving self.preserve_state()

1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Let’s change Thing2’s owner to User2

1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache post_save adds thing2 to user2’s cached ThingSet

1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache post_save removes thing2 from user1’s cached ThingSet using the _original_state

1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache The _original_state is updated to the current state

djangocon 2011 9/8/2011 Django Signals Caveat • A few Django
methods do not emit signals that you might expect • Queryset.update • Queryset.delete • RelatedManager.clear • Generally, methods that generate queries that can affect multiple database rows

djangocon 2011 9/8/2011 Third Party Enhancements • django-newcache https://github.com/ericﬂo/django-newcache •
johnny-cache http://packages.python.org/johnny-cache/index.html • django-cache-machine http://jbalogh.me/projects/cache-machine/ • django-autocache https://github.com/noah256/django-autocache

djangocon 2011 9/8/2011 Last Minute Advice • Cache servers should
not be publicly accessible • Consistent hashing is neat--use it! • Collections: QuerySets vs Lists/Dicts

djangocon 2011 9/8/2011 DoesNotExist Deserves Cache Love Common Pattern result
= cache.get(key) if not result: result = MyModel.objects.get(pk=pk) cache.set(key, result) return result

djangocon 2011 9/8/2011 DoesNotExist Deserves Cache Love Better Pattern DOES_NOT_EXIST
= '!!DNE!!' result = cache.get(key) if result == DOES_NOT_EXIST: raise MyModel.DoesNotExist("Object not found in Cache") elif result is None: try: result = MyModel.objects.get(pk=pk) except MyModel.DoesNotExist: cache.set(key, DOES_NOT_EXIST) raise cache.set(key, result) return result

djangocon 2011 9/8/2011 Questions? Jacob Burch RevSys Engineer @jacobburch Noah
Silas Causes.com Engineer @noah256 shout out to @mattdennewitz for the name inspiration

Cache Rules Everything Around Me

Cache Rules Everything Around Me

More Decks by Jacob Burch

Other Decks in Programming

Featured

Transcript