Slide 1

Slide 1 text

djangocon 2011 9/8/2011 Cache Rules Everything Around Me Noah Silas Jacob Burch

Slide 2

Slide 2 text

djangocon 2011 9/8/2011 Hello • Jacob Burch • Engineer at Revolution Systems. • Former CTO at Mahalo • Noah Silas • Engineer at Causes.com • Former Head Architect at Mahalo

Slide 3

Slide 3 text

djangocon 2011 9/8/2011 What we’re going to tell you • Brief (brief!) introduction in caching • Big Picture Considerations of a Caching Architecture • Implementation Best Practices

Slide 4

Slide 4 text

djangocon 2011 9/8/2011 What we won’t be talking about. • Backend Tuning • Backend Debates (redis vs. memcache) • Upstream Caches (squid, varnish, etc.) (all that much)

Slide 5

Slide 5 text

djangocon 2011 9/8/2011 The What/Why/How of Caching • Caching is storing post-processed data for more- immediate future retrieval • Usually stored in a memory key/value store (memcache, redis) • Used to • Speed up your app • Lessen load on other systems (your db, apis)

Slide 6

Slide 6 text

djangocon 2011 9/8/2011 Only Rule Of Architecture • There Are No Rules, Only Principles • Start with assumptions/advice • Benchmark/inspect/meditate on your application’s specific profile • Break principles as needed

Slide 7

Slide 7 text

djangocon 2011 9/8/2011 Plan Ahead

Slide 8

Slide 8 text

djangocon 2011 9/8/2011 Ask Yourself A Bunch Of Questions. And they don’t need to be answered immediately

Slide 9

Slide 9 text

djangocon 2011 9/8/2011 • Caching... • Adds complexity, sometimes in unexpected places • Additional point of failure • Modern Databases are stupidly optimized • May be all you need Do I Really Need Caching?

Slide 10

Slide 10 text

djangocon 2011 9/8/2011 “There are only two hard things in Computer Science: cache invalidation and naming things” - Phil Karlton

Slide 11

Slide 11 text

djangocon 2011 9/8/2011 Magical Cache Rule That Lives in a Gumdrop House On Lollypop Lane Your Application Should Never Rely On Caching

Slide 12

Slide 12 text

djangocon 2011 9/8/2011 So, You’re Relying On Your Cache Being Up • Still...

Slide 13

Slide 13 text

djangocon 2011 9/8/2011 Super Awesome Fun Time Rule for Minimizing Sadness • Your Application should have one canonical data source. • This data source IS NOT YOUR CACHE.

Slide 14

Slide 14 text

djangocon 2011 9/8/2011 Common Cache Patterns

Slide 15

Slide 15 text

djangocon 2011 9/8/2011 Pattern: A Few Expensive Operations • Rollup Values (Top Users, Most Commented on Articles, ...) • Anything that does nasty JOINs • External Service Calls

Slide 16

Slide 16 text

djangocon 2011 9/8/2011 Pattern: A Few Expensive Operations • What’s Easy: • Cache invalidation is relatively easy • What’s Hard: • Cost of falling out of cache is expensive

Slide 17

Slide 17 text

djangocon 2011 9/8/2011 Thundering Herd • Many requests try and get the same empty cache key for a piece of data • All those requests try and calculate the data at the same time • =-( • Databases go down • Third parties rate limit you

Slide 18

Slide 18 text

djangocon 2011 9/8/2011 The Old and Busted Caching Antipattern • Determine how long a piece of data can be stale for from a product prospective • Cache the data with a timeout for that length • Have the request recalculate the data and shove back into cache • Pro: Don’t even worry about Invalidation

Slide 19

Slide 19 text

djangocon 2011 9/8/2011 The New Hotness • Cache Forever*, Invalidate** explicitly*** • * - as long as your backend will allow • ** - Actively set data, not delete • *** - import this

Slide 20

Slide 20 text

djangocon 2011 9/8/2011 The New Hotness • Don’t issue DELETEs to your backend to purge stale data • Instead, calculate the new value and SET it • You should never experience a cache miss! • Expensive calculations can also be done asynchronously (Celery FTW)

Slide 21

Slide 21 text

djangocon 2011 9/8/2011 Pattern: Publish Cache • What’s Easy: • Implementation - Drop in Middleware • Cache Invalidation (sometimes) • What’s Hard • Dynamic Page Chunks • Cache Invalidation (sometimes)

Slide 22

Slide 22 text

djangocon 2011 9/8/2011 Publish Cache Resources • Your Django app doesn’t necessarily need to be aware of publish caching: Check out Varnish and Squid • You can avoid hitting your application at all by letting a front end server like nginx serve responses directly out of the publish cache

Slide 23

Slide 23 text

djangocon 2011 9/8/2011 Pattern: Lots of Small Things • What’s Easy: • Cost of falling out of cache is low • What’s Hard: • Cache Invalidation

Slide 24

Slide 24 text

djangocon 2011 9/8/2011 What makes Cache Invalidation hard? • How many keys hold a copy of this object? • How many keys hold a value derived from this object? • Which keys are they? Are you sure you got them all?

Slide 25

Slide 25 text

djangocon 2011 9/8/2011 How to Avoid Cache Invalidation Hell • How should I define my cache keys? • Where should I put my caching related code? • How can I do effective invalidation?

Slide 26

Slide 26 text

djangocon 2011 9/8/2011 What’s in the Box: django.core.cache • Simple Setup • Install the python bindings for your cache • and add a few lines to settings.py • Multi Cache Support

Slide 27

Slide 27 text

djangocon 2011 9/8/2011 django.core.cache.backends • db.DatabaseCache • filebased.FileBasedCache • locmem.LocMemCache • dummy.DummyCache • memcached.MemcachedCache • memcached.PyLibMCCache

Slide 28

Slide 28 text

djangocon 2011 9/8/2011 Publish Cache Provided • View Decorators are an excellent way to handle a Publish Cache • django.views.decorators.cache.cache_page • To cache all or most of your site, you can use a middleware and explicitly exempt individual views from caching • django.middleware.cache.UpdateCacheMiddleware • django.middleware.cache.FetchFromCacheMiddleware caveat: Hard to invalidate these

Slide 29

Slide 29 text

djangocon 2011 9/8/2011 {% load cache %} NO Don’t introduce cache logic into your templating layer

Slide 30

Slide 30 text

djangocon 2011 9/8/2011 CACHE

Slide 31

Slide 31 text

djangocon 2011 9/8/2011 django.core.cache keys • settings.CACHES[alias][‘VERSION’] • settings.CACHES[alias][‘KEY_PREFIX’] • settings.CACHES[alias][‘KEY_FUNCTION’] def make_key(key, key_prefix, version): return ':'.join([key_prefix, str(version), smart_str(key)])

Slide 32

Slide 32 text

djangocon 2011 9/8/2011 Good Cache Keys • Make Cache Keys Unique • Use separators that don’t appear in your values • Make Cache Keys Well-Defined

Slide 33

Slide 33 text

djangocon 2011 9/8/2011 What we mean by “Well Defined” • Always include all elements in a cache key, even if some are empty or None. Consistency! • Use a defined format string instead of an ad-hoc format string. • DRY! Don’t ever write the same cache key format string twice!

Slide 34

Slide 34 text

djangocon 2011 9/8/2011 What do we mean by “Well Defined” >>> user,x,y = ‘zainy’,0,2 >>> key = str(user) >>> if y: ... key += ':' + str(x) ... >>> if z: ... key += ':' + str(y) ... >>> key 'zainy:2' >>> keyfmt = '%(user)s:%(x)s:%(y)s' >>> key = keyfmt % { ... 'user': ‘zainy’, ... 'x': 0, ... 'y': 2} ... >>> key 'zainy:0:2' vs:

Slide 35

Slide 35 text

djangocon 2011 9/8/2011 KeyFiles make happy devs • Don’t define the same key format string in more than one place • Leaving cache key format strings littered around your code is a great way to discover the pains of circular imports • Put all of an app’s cache keys in a dictionary mapping descriptive names to format strings

Slide 36

Slide 36 text

djangocon 2011 9/8/2011 KeyFiles make happy devs app/keyfile.py app/something.py from app.keyfile import cachekeys key = cachekeys[‘UserThingSet’] % { ‘user_pk’: user.pk } data = cache.get(key) cachekeys = { ‘Thing’: ‘things:thing:%(thing_pk)s’, ‘UserThingSet’: ‘things:for_user:%(user_pk)s’ }

Slide 37

Slide 37 text

djangocon 2011 9/8/2011 Where should I put Cache related code? • Avoid putting it in your views • Avoid putting it in model save methods • DRY! Aim to write generic functionality to cover common cases.

Slide 38

Slide 38 text

djangocon 2011 9/8/2011 Where should I put Cache related code? • Manager-style objects are a great way to keep cache for model instances generic • MyModel.cache.get(pk=27) • looks good and is explicit! • Con: Hard to generalize cache key generation for non-pk arguments

Slide 39

Slide 39 text

djangocon 2011 9/8/2011 Where should I put Cache related code? • To cache collections of objects, add methods to your cache controller • User.cache.get_top10_users() • If you want to be really djangsta, let your cache controller delegate fetching collections of objects to custom model manager methods

Slide 40

Slide 40 text

djangocon 2011 9/8/2011 Original Djangsta @jeffschenck

Slide 41

Slide 41 text

djangocon 2011 9/8/2011 Model Invalidation is Hard • Update the cache for objects through a post_save signal handler • Easy for things keyed by PK (which doesn’t change), but hard for cache keys depending on mutable fields (like a foreign key). • If you change a foreign key, you may need to add an object to one collection, but remove it from another

Slide 42

Slide 42 text

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing 1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Things are owned by users via Foreign Key Cache maps user IDs to ThingSets

Slide 43

Slide 43 text

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing 1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Let’s change Thing2’s owner to User2

Slide 44

Slide 44 text

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing 1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache .save updates the database

Slide 45

Slide 45 text

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing 1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache post_save adds thing2 to user2’s cached ThingSet

Slide 46

Slide 46 text

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing 1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache How can we remove the link in User1’s ThingSet?

Slide 47

Slide 47 text

djangocon 2011 9/8/2011 Model Invalidation is Hard • We could refetch the object from DB during a pre_save and compare values • con: extra db work • con: we don’t want to invalidate in pre_save in case the save call fails • thread local storage to communicate between pre and post save handlers

Slide 48

Slide 48 text

djangocon 2011 9/8/2011 Model Invalidation is Hard • We can save the original model state after it is loaded, and then reference that state to invalidate old keys

Slide 49

Slide 49 text

djangocon 2011 9/8/2011 Model Invalidation is Hard def OriginalStateModel(Model): class Meta: abstract = True def preserve_state(self): # copy the fields into state storage fields = self._meta.fields state = dict( (f.name, getattr(self, f.name)) for f in fields ) self._original_state = state

Slide 50

Slide 50 text

djangocon 2011 9/8/2011 Model Invalidation is Hard def OriginalStateModel(Model): ... def __init__(self, *args, **kwargs): super(StateModel, self).__init__(*args, **kwargs) # preserve state after loading self.preserve_state() def save(self, *args, **kwargs): super(OriginalStateModel, self).save(*args, **kwargs) # db save and signal handlers have already happened # preserve state after saving self.preserve_state()

Slide 51

Slide 51 text

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing 1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Let’s change Thing2’s owner to User2

Slide 52

Slide 52 text

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing 1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache Let’s change Thing2’s owner to User2

Slide 53

Slide 53 text

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing 1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache post_save adds thing2 to user2’s cached ThingSet

Slide 54

Slide 54 text

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing 1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache post_save removes thing2 from user1’s cached ThingSet using the _original_state

Slide 55

Slide 55 text

djangocon 2011 9/8/2011 Cache Capers User 1 User 2 Thing 1 Thing 2 Database User 1 User 2 Thing 1 Thing 2 Cache The _original_state is updated to the current state

Slide 56

Slide 56 text

djangocon 2011 9/8/2011 Django Signals Caveat • A few Django methods do not emit signals that you might expect • Queryset.update • Queryset.delete • RelatedManager.clear • Generally, methods that generate queries that can affect multiple database rows

Slide 57

Slide 57 text

djangocon 2011 9/8/2011 Third Party Enhancements • django-newcache https://github.com/ericflo/django-newcache • johnny-cache http://packages.python.org/johnny-cache/index.html • django-cache-machine http://jbalogh.me/projects/cache-machine/ • django-autocache https://github.com/noah256/django-autocache

Slide 58

Slide 58 text

djangocon 2011 9/8/2011 Last Minute Advice • Cache servers should not be publicly accessible • Consistent hashing is neat--use it! • Collections: QuerySets vs Lists/Dicts

Slide 59

Slide 59 text

djangocon 2011 9/8/2011 DoesNotExist Deserves Cache Love Common Pattern result = cache.get(key) if not result: result = MyModel.objects.get(pk=pk) cache.set(key, result) return result

Slide 60

Slide 60 text

djangocon 2011 9/8/2011 DoesNotExist Deserves Cache Love Better Pattern DOES_NOT_EXIST = '!!DNE!!' result = cache.get(key) if result == DOES_NOT_EXIST: raise MyModel.DoesNotExist("Object not found in Cache") elif result is None: try: result = MyModel.objects.get(pk=pk) except MyModel.DoesNotExist: cache.set(key, DOES_NOT_EXIST) raise cache.set(key, result) return result

Slide 61

Slide 61 text

djangocon 2011 9/8/2011 Questions? Jacob Burch RevSys Engineer @jacobburch Noah Silas Causes.com Engineer @noah256 shout out to @mattdennewitz for the name inspiration