Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cache Rules Everything Around Me

Jacob Burch
September 26, 2011

Cache Rules Everything Around Me

Given at Djangocon 2011, Portland Oregon

Jacob Burch

September 26, 2011
Tweet

More Decks by Jacob Burch

Other Decks in Programming

Transcript

  1. djangocon 2011 9/8/2011 Hello • Jacob Burch • Engineer at

    Revolution Systems. • Former CTO at Mahalo • Noah Silas • Engineer at Causes.com • Former Head Architect at Mahalo
  2. djangocon 2011 9/8/2011 What we’re going to tell you •

    Brief (brief!) introduction in caching • Big Picture Considerations of a Caching Architecture • Implementation Best Practices
  3. djangocon 2011 9/8/2011 What we won’t be talking about. •

    Backend Tuning • Backend Debates (redis vs. memcache) • Upstream Caches (squid, varnish, etc.) (all that much)
  4. djangocon 2011 9/8/2011 The What/Why/How of Caching • Caching is

    storing post-processed data for more- immediate future retrieval • Usually stored in a memory key/value store (memcache, redis) • Used to • Speed up your app • Lessen load on other systems (your db, apis)
  5. djangocon 2011 9/8/2011 Only Rule Of Architecture • There Are

    No Rules, Only Principles • Start with assumptions/advice • Benchmark/inspect/meditate on your application’s specific profile • Break principles as needed
  6. djangocon 2011 9/8/2011 Ask Yourself A Bunch Of Questions. And

    they don’t need to be answered immediately
  7. djangocon 2011 9/8/2011 • Caching... • Adds complexity, sometimes in

    unexpected places • Additional point of failure • Modern Databases are stupidly optimized • May be all you need Do I Really Need Caching?
  8. djangocon 2011 9/8/2011 “There are only two hard things in

    Computer Science: cache invalidation and naming things” - Phil Karlton
  9. djangocon 2011 9/8/2011 Magical Cache Rule That Lives in a

    Gumdrop House On Lollypop Lane Your Application Should Never Rely On Caching
  10. djangocon 2011 9/8/2011 Super Awesome Fun Time Rule for Minimizing

    Sadness • Your Application should have one canonical data source. • This data source IS NOT YOUR CACHE.
  11. djangocon 2011 9/8/2011 Pattern: A Few Expensive Operations • Rollup

    Values (Top Users, Most Commented on Articles, ...) • Anything that does nasty JOINs • External Service Calls
  12. djangocon 2011 9/8/2011 Pattern: A Few Expensive Operations • What’s

    Easy: • Cache invalidation is relatively easy • What’s Hard: • Cost of falling out of cache is expensive
  13. djangocon 2011 9/8/2011 Thundering Herd • Many requests try and

    get the same empty cache key for a piece of data • All those requests try and calculate the data at the same time • =-( • Databases go down • Third parties rate limit you
  14. djangocon 2011 9/8/2011 The Old and Busted Caching Antipattern •

    Determine how long a piece of data can be stale for from a product prospective • Cache the data with a timeout for that length • Have the request recalculate the data and shove back into cache • Pro: Don’t even worry about Invalidation
  15. djangocon 2011 9/8/2011 The New Hotness • Cache Forever*, Invalidate**

    explicitly*** • * - as long as your backend will allow • ** - Actively set data, not delete • *** - import this
  16. djangocon 2011 9/8/2011 The New Hotness • Don’t issue DELETEs

    to your backend to purge stale data • Instead, calculate the new value and SET it • You should never experience a cache miss! • Expensive calculations can also be done asynchronously (Celery FTW)
  17. djangocon 2011 9/8/2011 Pattern: Publish Cache • What’s Easy: •

    Implementation - Drop in Middleware • Cache Invalidation (sometimes) • What’s Hard • Dynamic Page Chunks • Cache Invalidation (sometimes)
  18. djangocon 2011 9/8/2011 Publish Cache Resources • Your Django app

    doesn’t necessarily need to be aware of publish caching: Check out Varnish and Squid • You can avoid hitting your application at all by letting a front end server like nginx serve responses directly out of the publish cache
  19. djangocon 2011 9/8/2011 Pattern: Lots of Small Things • What’s

    Easy: • Cost of falling out of cache is low • What’s Hard: • Cache Invalidation
  20. djangocon 2011 9/8/2011 What makes Cache Invalidation hard? • How

    many keys hold a copy of this object? • How many keys hold a value derived from this object? • Which keys are they? Are you sure you got them all?
  21. djangocon 2011 9/8/2011 How to Avoid Cache Invalidation Hell •

    How should I define my cache keys? • Where should I put my caching related code? • How can I do effective invalidation?
  22. djangocon 2011 9/8/2011 What’s in the Box: django.core.cache • Simple

    Setup • Install the python bindings for your cache • and add a few lines to settings.py • Multi Cache Support
  23. djangocon 2011 9/8/2011 Publish Cache Provided • View Decorators are

    an excellent way to handle a Publish Cache • django.views.decorators.cache.cache_page • To cache all or most of your site, you can use a middleware and explicitly exempt individual views from caching • django.middleware.cache.UpdateCacheMiddleware • django.middleware.cache.FetchFromCacheMiddleware caveat: Hard to invalidate these
  24. djangocon 2011 9/8/2011 {% load cache %} NO Don’t introduce

    cache logic into your templating layer
  25. djangocon 2011 9/8/2011 django.core.cache keys • settings.CACHES[alias][‘VERSION’] • settings.CACHES[alias][‘KEY_PREFIX’] •

    settings.CACHES[alias][‘KEY_FUNCTION’] def make_key(key, key_prefix, version): return ':'.join([key_prefix, str(version), smart_str(key)])
  26. djangocon 2011 9/8/2011 Good Cache Keys • Make Cache Keys

    Unique • Use separators that don’t appear in your values • Make Cache Keys Well-Defined
  27. djangocon 2011 9/8/2011 What we mean by “Well Defined” •

    Always include all elements in a cache key, even if some are empty or None. Consistency! • Use a defined format string instead of an ad-hoc format string. • DRY! Don’t ever write the same cache key format string twice!
  28. djangocon 2011 9/8/2011 What do we mean by “Well Defined”

    >>> user,x,y = ‘zainy’,0,2 >>> key = str(user) >>> if y: ... key += ':' + str(x) ... >>> if z: ... key += ':' + str(y) ... >>> key 'zainy:2' >>> keyfmt = '%(user)s:%(x)s:%(y)s' >>> key = keyfmt % { ... 'user': ‘zainy’, ... 'x': 0, ... 'y': 2} ... >>> key 'zainy:0:2' vs:
  29. djangocon 2011 9/8/2011 KeyFiles make happy devs • Don’t define

    the same key format string in more than one place • Leaving cache key format strings littered around your code is a great way to discover the pains of circular imports • Put all of an app’s cache keys in a dictionary mapping descriptive names to format strings
  30. djangocon 2011 9/8/2011 KeyFiles make happy devs app/keyfile.py app/something.py from

    app.keyfile import cachekeys key = cachekeys[‘UserThingSet’] % { ‘user_pk’: user.pk } data = cache.get(key) cachekeys = { ‘Thing’: ‘things:thing:%(thing_pk)s’, ‘UserThingSet’: ‘things:for_user:%(user_pk)s’ }
  31. djangocon 2011 9/8/2011 Where should I put Cache related code?

    • Avoid putting it in your views • Avoid putting it in model save methods • DRY! Aim to write generic functionality to cover common cases.
  32. djangocon 2011 9/8/2011 Where should I put Cache related code?

    • Manager-style objects are a great way to keep cache for model instances generic • MyModel.cache.get(pk=27) • looks good and is explicit! • Con: Hard to generalize cache key generation for non-pk arguments
  33. djangocon 2011 9/8/2011 Where should I put Cache related code?

    • To cache collections of objects, add methods to your cache controller • User.cache.get_top10_users() • If you want to be really djangsta, let your cache controller delegate fetching collections of objects to custom model manager methods
  34. djangocon 2011 9/8/2011 Model Invalidation is Hard • Update the

    cache for objects through a post_save signal handler • Easy for things keyed by PK (which doesn’t change), but hard for cache keys depending on mutable fields (like a foreign key). • If you change a foreign key, you may need to add an object to one collection, but remove it from another
  35. djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing

    1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Things are owned by users via Foreign Key Cache maps user IDs to ThingSets
  36. djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing

    1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Let’s change Thing2’s owner to User2
  37. djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing

    1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache .save updates the database
  38. djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing

    1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache post_save adds thing2 to user2’s cached ThingSet
  39. djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing

    1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache How can we remove the link in User1’s ThingSet?
  40. djangocon 2011 9/8/2011 Model Invalidation is Hard • We could

    refetch the object from DB during a pre_save and compare values • con: extra db work • con: we don’t want to invalidate in pre_save in case the save call fails • thread local storage to communicate between pre and post save handlers
  41. djangocon 2011 9/8/2011 Model Invalidation is Hard • We can

    save the original model state after it is loaded, and then reference that state to invalidate old keys
  42. djangocon 2011 9/8/2011 Model Invalidation is Hard def OriginalStateModel(Model): class

    Meta: abstract = True def preserve_state(self): # copy the fields into state storage fields = self._meta.fields state = dict( (f.name, getattr(self, f.name)) for f in fields ) self._original_state = state
  43. djangocon 2011 9/8/2011 Model Invalidation is Hard def OriginalStateModel(Model): ...

    def __init__(self, *args, **kwargs): super(StateModel, self).__init__(*args, **kwargs) # preserve state after loading self.preserve_state() def save(self, *args, **kwargs): super(OriginalStateModel, self).save(*args, **kwargs) # db save and signal handlers have already happened # preserve state after saving self.preserve_state()
  44. djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing

    1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Let’s change Thing2’s owner to User2
  45. djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing

    1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Let’s change Thing2’s owner to User2
  46. djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing

    1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache post_save adds thing2 to user2’s cached ThingSet
  47. djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing

    1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache post_save removes thing2 from user1’s cached ThingSet using the _original_state
  48. djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing

    1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache The _original_state is updated to the current state
  49. djangocon 2011 9/8/2011 Django Signals Caveat • A few Django

    methods do not emit signals that you might expect • Queryset.update • Queryset.delete • RelatedManager.clear • Generally, methods that generate queries that can affect multiple database rows
  50. djangocon 2011 9/8/2011 Third Party Enhancements • django-newcache https://github.com/ericflo/django-newcache •

    johnny-cache http://packages.python.org/johnny-cache/index.html • django-cache-machine http://jbalogh.me/projects/cache-machine/ • django-autocache https://github.com/noah256/django-autocache
  51. djangocon 2011 9/8/2011 Last Minute Advice • Cache servers should

    not be publicly accessible • Consistent hashing is neat--use it! • Collections: QuerySets vs Lists/Dicts
  52. djangocon 2011 9/8/2011 DoesNotExist Deserves Cache Love Common Pattern result

    = cache.get(key) if not result: result = MyModel.objects.get(pk=pk) cache.set(key, result) return result
  53. djangocon 2011 9/8/2011 DoesNotExist Deserves Cache Love Better Pattern DOES_NOT_EXIST

    = '!!DNE!!' result = cache.get(key) if result == DOES_NOT_EXIST: raise MyModel.DoesNotExist("Object not found in Cache") elif result is None: try: result = MyModel.objects.get(pk=pk) except MyModel.DoesNotExist: cache.set(key, DOES_NOT_EXIST) raise cache.set(key, result) return result
  54. djangocon 2011 9/8/2011 Questions? Jacob Burch RevSys Engineer @jacobburch Noah

    Silas Causes.com Engineer @noah256 shout out to @mattdennewitz for the name inspiration