storing post-processed data for more- immediate future retrieval • Usually stored in a memory key/value store (memcache, redis) • Used to • Speed up your app • Lessen load on other systems (your db, apis)
No Rules, Only Principles • Start with assumptions/advice • Benchmark/inspect/meditate on your application’s specific profile • Break principles as needed
get the same empty cache key for a piece of data • All those requests try and calculate the data at the same time • =-( • Databases go down • Third parties rate limit you
Determine how long a piece of data can be stale for from a product prospective • Cache the data with a timeout for that length • Have the request recalculate the data and shove back into cache • Pro: Don’t even worry about Invalidation
to your backend to purge stale data • Instead, calculate the new value and SET it • You should never experience a cache miss! • Expensive calculations can also be done asynchronously (Celery FTW)
doesn’t necessarily need to be aware of publish caching: Check out Varnish and Squid • You can avoid hitting your application at all by letting a front end server like nginx serve responses directly out of the publish cache
an excellent way to handle a Publish Cache • django.views.decorators.cache.cache_page • To cache all or most of your site, you can use a middleware and explicitly exempt individual views from caching • django.middleware.cache.UpdateCacheMiddleware • django.middleware.cache.FetchFromCacheMiddleware caveat: Hard to invalidate these
Always include all elements in a cache key, even if some are empty or None. Consistency! • Use a defined format string instead of an ad-hoc format string. • DRY! Don’t ever write the same cache key format string twice!
the same key format string in more than one place • Leaving cache key format strings littered around your code is a great way to discover the pains of circular imports • Put all of an app’s cache keys in a dictionary mapping descriptive names to format strings
• Manager-style objects are a great way to keep cache for model instances generic • MyModel.cache.get(pk=27) • looks good and is explicit! • Con: Hard to generalize cache key generation for non-pk arguments
• To cache collections of objects, add methods to your cache controller • User.cache.get_top10_users() • If you want to be really djangsta, let your cache controller delegate fetching collections of objects to custom model manager methods
cache for objects through a post_save signal handler • Easy for things keyed by PK (which doesn’t change), but hard for cache keys depending on mutable fields (like a foreign key). • If you change a foreign key, you may need to add an object to one collection, but remove it from another
refetch the object from DB during a pre_save and compare values • con: extra db work • con: we don’t want to invalidate in pre_save in case the save call fails • thread local storage to communicate between pre and post save handlers
Meta: abstract = True def preserve_state(self): # copy the fields into state storage fields = self._meta.fields state = dict( (f.name, getattr(self, f.name)) for f in fields ) self._original_state = state
def __init__(self, *args, **kwargs): super(StateModel, self).__init__(*args, **kwargs) # preserve state after loading self.preserve_state() def save(self, *args, **kwargs): super(OriginalStateModel, self).save(*args, **kwargs) # db save and signal handlers have already happened # preserve state after saving self.preserve_state()
methods do not emit signals that you might expect • Queryset.update • Queryset.delete • RelatedManager.clear • Generally, methods that generate queries that can affect multiple database rows
= '!!DNE!!' result = cache.get(key) if result == DOES_NOT_EXIST: raise MyModel.DoesNotExist("Object not found in Cache") elif result is None: try: result = MyModel.objects.get(pk=pk) except MyModel.DoesNotExist: cache.set(key, DOES_NOT_EXIST) raise cache.set(key, result) return result