Scaling Django from server to browser through an efficient caching strategy by Hedley Roos

Pycon2016 talk 1 of 46

Django Twisted React All the popular things Pycon2016 talk 2
of 46

Our systems handle large amounts of traﬀic and we want
to run these with the minimum amount of resources. Pycon2016 talk 3 of 46

So what's the problem? Pycon2016 talk 4 of 46

Things break at scale Pycon2016 talk 5 of 46

Your site takes 10 ms to complete a request. It
hits the database on each request. Performs some logic. Renders a template. This sounds great! Pycon2016 talk 6 of 46

You do something amazing / stupid / illegal and suddenly
1000 people hit the site concurrently. They do a request every 5 seconds. That means every 5 seconds you need 1000 times 10ms, or 10 seconds of system time. You're 100% over capacity. Site goes down. This sounds terrible! Pycon2016 talk 7 of 46

Reality is actually worse Things get worse faster than linearly.
Eg. the database may only allow 20 concurrent connections and without connection pooling you're stuck. You may exceed the amount of bandwidth the physical network interface card oﬀers. You may exceed maximum number of open files on the operating system. Pycon2016 talk 8 of 46

What are your options? Pycon2016 talk 9 of 46

Add more hardware! And make Larry richer. Pycon2016 talk 10
of 46

Rewrite your entire system in another language. And delay the
inevitable. Pycon2016 talk 11 of 46

Fix the real problem Pycon2016 talk 12 of 46

Identify the points in your system that can be cached.
Django templates View code Database queries Reverse caching proxy The browser itself Pycon2016 talk 13 of 46

Django template fragment caching {% cache 3600 "some-identifier" object.id object.modified
%} {{ object.title }} {% endcache %} Pycon2016 talk 14 of 46

View code def get(self): key = "myview-%s" % self.request.get_full_path() cached
= cache.get(key, None) if cached is not None: return cached result = super(MyView, self).get() cache.set(key, result, 3600) return result Pycon2016 talk 15 of 46

Not a fan of this because DB consistency is tricky
enough, and this adds another point of possible inconsistency. Database queries Pycon2016 talk 16 of 46

Reverse caching proxy with Nginx proxy_ignore_headers Set-Cookie; proxy_cache thecache; proxy_cache_valid
200 404 1m; proxy_cache_use_stale updating; proxy_cache_lock on; add_header X-Cached $upstream_cache_status; Pycon2016 talk 17 of 46

The browser will look at visited URLs in its history
and honour caching headers. The browser itself Pycon2016 talk 18 of 46

Pick a caching backend In-memory. Simple to implement but does
not allow sharing of the cache between Django processes. Database. Let's make things worse. Just no. Memcache. Simple to implement, shares cache between Django processes. There are many more. Pycon2016 talk 19 of 46

Template fragment caching. django-ultracache It takes the sites framework into
consideration, allowing diﬀerent caching per site. It allows undefined variables to be passed as arguments, thus simplifying the template. Crucially, it is aware of model objects that are subjected to its caching. When an object is modified all aﬀected cache key are automatically expired. This allows the user to set longer expiry times without having to worry about stale content. Pycon2016 talk 20 of 46

Handles undefined variables {% load ultracache_tags %} {% ultracache 3600
"my_identifier" object object.some_property undefined %} {{ object.title }} {% if object.some_property %} This object has some property. {% endif %} {% endultracache %} Pycon2016 talk 21 of 46

In this example cache keys outer and inner_one are expired
when object one is changed but cache key inner_two remains unaﬀected: The tag can be nested. ultracache is aware of all model objects that are subjected to its caching. {% load ultracache_tags %} {% ultracache 1200 "outer" %} {% ultracache 1200 "inner_one" %} title = {{ one.title }} {% endultracache %} {% ultracache 1200 "inner_two" %} title = {{ two.title }} {% endultracache %} {% endultracache %} Pycon2016 talk 22 of 46

A good key contains the amount of information that determines
the content inside the ultracache tag. What makes a good cache key? minimum Pycon2016 talk 23 of 46

Good: Probably bad: Probably pointless: Insane: {% ultracache "some-id" object.id
object.category.id %} {{ object.title }} has category {{ object.category.title }} {% endultracache %} {% ultracache "some-id" object.id %} {{ object.title }} has category {{ object.category.title }} {% endultracache %} {% ultracache "some-id" object.id request.user.id %} {{ object.title }} {% endultracache %} {% ultracache "some-id" object.id request.get_full_path %} {{ object.title }} {% endultracache %} Pycon2016 talk 24 of 46

How does it work? django-ultracache monkey patches django.template.base.Variable._resolve_lookup to make
a record of model objects as they are resolved. The ultracache template tag inspects the list of objects contained within it and keeps a registry in Django’s caching backend. A post_save signal handler monitors objects for changes and expires the appropriate cache keys. Pycon2016 talk 25 of 46

What's in this registry? (ctid1, obj1_pk) = [cache_key_1, cache_key_2] (ctid1,
obj2_pk) = [cache_key_2, cache_key_3] (ctid1, obj1_pk) = [/page/one/, /page/two/] (ctid1, obj2_pk) = [/page/two/, /page/three/] ctid1 = [cache_key_1, cache_key_2] ctid2 = [cache_key_3, cache_key_3] ctid1 = [/page/one/, /page/two/] ctid2 = [/page/two/, /page/three/] Pycon2016 talk 26 of 46

View caching django-ultracache provides a decorator cached_get to cache your
views. The parameters follow the same rules as the ultracache template tag except they must all resolve. request.get_full_path() is always implicitly added to the cache key. Pycon2016 talk 27 of 46

from ultracache.decorators import cached_get class CachedView(TemplateView): template_name = "ultracache/cached_view.html" @cached_get(300,
"request.is_secure()", 456) def get(self, *args, **kwargs): return super(CachedView, self).get(*args, **kwargs) Pycon2016 talk 28 of 46

from ultracache.decorators import cached_get url( r"^cached-view/$", cached_get(3600)(TemplateView.as_view( template_name="myproduct/template.html" )), name="cached-view"
) Pycon2016 talk 29 of 46

HTTP caching headers Caching headers allow you to define how
URLs are to be cached. These headers are interpreted by caching proxies and browsers in subtly diﬀerent ways. Last-Modified: Thu, 6 Oct 2016 11:00:00 GMT Cache-Control: max-age=1200 X-Accel-Expires: 120 Pycon2016 talk 30 of 46

At 11:01:00 User clicks link to page Browser hits Nginx
Nginx hits Django Nginx caches response Browser receives response Browser caches response Pycon2016 talk 31 of 46

At 11:01:30 User clicks link to page Browser checks local
cache Browser finds cached version, sees time is not greater than Last-Modified + X-Accel-Expires (120), and renders it Pycon2016 talk 32 of 46

cache Browser finds cached version, sees time is greater than Last-Modified + X-Accel-Expires (120) Browser hits Nginx Nginx checks local cache Nginx finds cached version, sees time is not greater than Last-Modified + max-age and returns not modified Browser receives response Pycon2016 talk 33 of 46

cache Browser finds cached version, sees time is greater than Last-Modified + X-Accel-Expires (120) Browser hits Nginx Nginx finds cached version, sees time is greater than Last-Modified max-age Nginx hits Django Nginx caches response Browser receives response Browser caches response Pycon2016 talk 34 of 46

How do you tell Nginx which caching headers to use?
TIMEOUT = { 30: re.compile(r"|".join(( "^/articles/", ))), 60: re.compile(r"|".join(( "^/pages/", "^/blogs/", ))), # Pre-compute sorted keys TIMEOUT_KEYS = TIMEOUT.keys() TIMEOUT_KEYS.sort() Pycon2016 talk 35 of 46

class ProxyCacheMiddleware(object): def process_response(self, request, response): # Default response["Cache-Control"] =
"no-cache" # Never cache non-GET if request.method.lower() not in ("get", "head"): return response # Determine age age = 0 for key in TIMEOUT_KEYS: if TIMEOUT[key].match(request.path_info): age = key break if age: response["Last-Modified"] = httpdate(datetime.datetime.utcnow() response["X-Accel-Expires"] = age response["Cache-Control"] = "max-age=%d" % max(age / 6, 30) response["Vary"] = "Accept-Encoding" else: response["Cache-Control"] = "no-cache" return response Pycon2016 talk 36 of 46

Purging paths from the Nginx reverse cache Each node runs
Nginx as web server and a reverse cache. The nodes have no knowledge of each other. Django has no knowledge of the nodes. So how do we broadcast the purge instruction? Pycon2016 talk 37 of 46

RabbitMQ fanout! It's a type of exchange. Used for pub
/ sub implementations. Both Django and the Twisted services need only agree on the exchange name. Each node is controlled by Puppet / Salt and that sets the RabbitMQ address. Pycon2016 talk 38 of 46

# settings.py ULTRACACHE = {"purge": {"method": "myapp.purgers.immediate"}} # purgers.py import
pika def immediate(path): # Use same host as celery host, _ = settings.BROKER_URL.split("/")[2].split(":") connection = pika.BlockingConnection( pika.ConnectionParameters(host=host) ) channel = connection.channel() channel.exchange_declare(exchange="purgatory", type="fanout") channel.basic_publish( exchange="purgatory", routing_key="", body=path ) connection.close() return True Pycon2016 talk 39 of 46

class MonitorService(service.Service): def connect(self): parameters = pika.ConnectionParameters() cc = protocol.ClientCreator(
reactor, twisted_connection.TwistedProtocolConnection, paramete ) d = cc.connectTCP("localhost", 5672) d.addCallback(lambda protocol: protocol.ready) d.addCallback(self.setup_connection) return d @defer.inlineCallbacks def setup_connection(self, connection): self.channel = yield connection.channel() yield self.channel.exchange_declare( exchange="purgatory", type="fanout" ) result = yield self.channel.queue_declare(exclusive=True) queue_name = result.method.queue yield self.channel.queue_bind( exchange="purgatory", queue=queue_name ) yield self.channel.basic_qos(prefetch_count=1) self.queue_object, self.consumer_tag = \ yield self.channel.basic_consume( queue=queue_name, no_ack=False, exclusive=True ) Pycon2016 talk 40 of 46

@defer.inlineCallbacks def startService(self): self.running = 1 yield self.connect() self.process_d =
self.process_queue() @defer.inlineCallbacks def stopService(self): if not self.running: return yield self.channel.basic_cancel( callback=None, consumer_tag=self.consumer_tag ) self.queue_object.put(None) yield self.process_d self.running = 0 Pycon2016 talk 41 of 46

@defer.inlineCallbacks def process_queue(self): while True: thing = yield self.queue_object.get() if
thing is None: break ch, method, properties, body = thing if body: path = body try: response = yield treq.request( "PURGE", "http://127.0.0.1" + path, headers={"Host": "http://mysite.com"}, timeout=10 ) except (ConnectError, DNSLookupError, CancelledError, Respo pass else: content = yield response.content() yield ch.basic_ack(delivery_tag=method.delivery_tag) Pycon2016 talk 42 of 46

What this gets us on Telkom.co.za About 80% reverse cache
HIT ratio It's actually better because we serve expired pages when the cache updates 5-10% MISS ratio Memcache does a lot more traﬀic - 3MB per second read+write combined Memcache read:write ratio is 10:1 We can technically serve the site on very weak hardware Pycon2016 talk 43 of 46

Pycon2016 talk 44 of 46

If you get an interview with us we'll give you
a Raspberry Pi 3 Pycon2016 talk 45 of 46

FIN Pycon2016 talk 46 of 46

Scaling Django from server to browser through a...

Scaling Django from server to browser through an efficient caching strategy by Hedley Roos

More Decks by Pycon ZA

Featured

Transcript