Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling Django from server to browser through a...

Pycon ZA
October 06, 2016
120

Scaling Django from server to browser through an efficient caching strategy by Hedley Roos

At Praekelt we're responsible for high traffic sites built in Django. An efficient caching strategy is required to minimize server load and in this talk we illustrate the techniques we use to serve Telkom's web site.

The talk assumes basic Django knowledge.

We will touch on:

Volatile caching with memcached as backend.
Template fragment caching.
View caching.
HTTP caching headers and how they affect Nginx and browsers.
Automated cache invalidation.
Automated Nginx reverse cache purging.

Pycon ZA

October 06, 2016
Tweet

More Decks by Pycon ZA

Transcript

  1. Our systems handle large amounts of traffic and we want

    to run these with the minimum amount of resources. Pycon2016 talk 3 of 46
  2. Your site takes 10 ms to complete a request. It

    hits the database on each request. Performs some logic. Renders a template. This sounds great! Pycon2016 talk 6 of 46
  3. You do something amazing / stupid / illegal and suddenly

    1000 people hit the site concurrently. They do a request every 5 seconds. That means every 5 seconds you need 1000 times 10ms, or 10 seconds of system time. You're 100% over capacity. Site goes down. This sounds terrible! Pycon2016 talk 7 of 46
  4. Reality is actually worse Things get worse faster than linearly.

    Eg. the database may only allow 20 concurrent connections and without connection pooling you're stuck. You may exceed the amount of bandwidth the physical network interface card offers. You may exceed maximum number of open files on the operating system. Pycon2016 talk 8 of 46
  5. Identify the points in your system that can be cached.

    Django templates View code Database queries Reverse caching proxy The browser itself Pycon2016 talk 13 of 46
  6. Django template fragment caching {% cache 3600 "some-identifier" object.id object.modified

    %} {{ object.title }} {% endcache %} Pycon2016 talk 14 of 46
  7. View code def get(self): key = "myview-%s" % self.request.get_full_path() cached

    = cache.get(key, None) if cached is not None: return cached result = super(MyView, self).get() cache.set(key, result, 3600) return result Pycon2016 talk 15 of 46
  8. Not a fan of this because DB consistency is tricky

    enough, and this adds another point of possible inconsistency. Database queries Pycon2016 talk 16 of 46
  9. Reverse caching proxy with Nginx proxy_ignore_headers Set-Cookie; proxy_cache thecache; proxy_cache_valid

    200 404 1m; proxy_cache_use_stale updating; proxy_cache_lock on; add_header X-Cached $upstream_cache_status; Pycon2016 talk 17 of 46
  10. The browser will look at visited URLs in its history

    and honour caching headers. The browser itself Pycon2016 talk 18 of 46
  11. Pick a caching backend In-memory. Simple to implement but does

    not allow sharing of the cache between Django processes. Database. Let's make things worse. Just no. Memcache. Simple to implement, shares cache between Django processes. There are many more. Pycon2016 talk 19 of 46
  12. Template fragment caching. django-ultracache It takes the sites framework into

    consideration, allowing different caching per site. It allows undefined variables to be passed as arguments, thus simplifying the template. Crucially, it is aware of model objects that are subjected to its caching. When an object is modified all affected cache key are automatically expired. This allows the user to set longer expiry times without having to worry about stale content. Pycon2016 talk 20 of 46
  13. Handles undefined variables {% load ultracache_tags %} {% ultracache 3600

    "my_identifier" object object.some_property undefined %} {{ object.title }} {% if object.some_property %} This object has some property. {% endif %} {% endultracache %} Pycon2016 talk 21 of 46
  14. In this example cache keys outer and inner_one are expired

    when object one is changed but cache key inner_two remains unaffected: The tag can be nested. ultracache is aware of all model objects that are subjected to its caching. {% load ultracache_tags %} {% ultracache 1200 "outer" %} {% ultracache 1200 "inner_one" %} title = {{ one.title }} {% endultracache %} {% ultracache 1200 "inner_two" %} title = {{ two.title }} {% endultracache %} {% endultracache %} Pycon2016 talk 22 of 46
  15. A good key contains the amount of information that determines

    the content inside the ultracache tag. What makes a good cache key? minimum Pycon2016 talk 23 of 46
  16. Good: Probably bad: Probably pointless: Insane: {% ultracache "some-id" object.id

    object.category.id %} {{ object.title }} has category {{ object.category.title }} {% endultracache %} {% ultracache "some-id" object.id %} {{ object.title }} has category {{ object.category.title }} {% endultracache %} {% ultracache "some-id" object.id request.user.id %} {{ object.title }} {% endultracache %} {% ultracache "some-id" object.id request.get_full_path %} {{ object.title }} {% endultracache %} Pycon2016 talk 24 of 46
  17. How does it work? django-ultracache monkey patches django.template.base.Variable._resolve_lookup to make

    a record of model objects as they are resolved. The ultracache template tag inspects the list of objects contained within it and keeps a registry in Django’s caching backend. A post_save signal handler monitors objects for changes and expires the appropriate cache keys. Pycon2016 talk 25 of 46
  18. What's in this registry? (ctid1, obj1_pk) = [cache_key_1, cache_key_2] (ctid1,

    obj2_pk) = [cache_key_2, cache_key_3] (ctid1, obj1_pk) = [/page/one/, /page/two/] (ctid1, obj2_pk) = [/page/two/, /page/three/] ctid1 = [cache_key_1, cache_key_2] ctid2 = [cache_key_3, cache_key_3] ctid1 = [/page/one/, /page/two/] ctid2 = [/page/two/, /page/three/] Pycon2016 talk 26 of 46
  19. View caching django-ultracache provides a decorator cached_get to cache your

    views. The parameters follow the same rules as the ultracache template tag except they must all resolve. request.get_full_path() is always implicitly added to the cache key. Pycon2016 talk 27 of 46
  20. from ultracache.decorators import cached_get class CachedView(TemplateView): template_name = "ultracache/cached_view.html" @cached_get(300,

    "request.is_secure()", 456) def get(self, *args, **kwargs): return super(CachedView, self).get(*args, **kwargs) Pycon2016 talk 28 of 46
  21. HTTP caching headers Caching headers allow you to define how

    URLs are to be cached. These headers are interpreted by caching proxies and browsers in subtly different ways. Last-Modified: Thu, 6 Oct 2016 11:00:00 GMT Cache-Control: max-age=1200 X-Accel-Expires: 120 Pycon2016 talk 30 of 46
  22. At 11:01:00 User clicks link to page Browser hits Nginx

    Nginx hits Django Nginx caches response Browser receives response Browser caches response Pycon2016 talk 31 of 46
  23. At 11:01:30 User clicks link to page Browser checks local

    cache Browser finds cached version, sees time is not greater than Last-Modified + X-Accel-Expires (120), and renders it Pycon2016 talk 32 of 46
  24. At 11:02:30 User clicks link to page Browser checks local

    cache Browser finds cached version, sees time is greater than Last-Modified + X-Accel-Expires (120) Browser hits Nginx Nginx checks local cache Nginx finds cached version, sees time is not greater than Last-Modified + max-age and returns not modified Browser receives response Pycon2016 talk 33 of 46
  25. At 11:25:00 User clicks link to page Browser checks local

    cache Browser finds cached version, sees time is greater than Last-Modified + X-Accel-Expires (120) Browser hits Nginx Nginx finds cached version, sees time is greater than Last-Modified max-age Nginx hits Django Nginx caches response Browser receives response Browser caches response Pycon2016 talk 34 of 46
  26. How do you tell Nginx which caching headers to use?

    TIMEOUT = { 30: re.compile(r"|".join(( "^/articles/", ))), 60: re.compile(r"|".join(( "^/pages/", "^/blogs/", ))), # Pre-compute sorted keys TIMEOUT_KEYS = TIMEOUT.keys() TIMEOUT_KEYS.sort() Pycon2016 talk 35 of 46
  27. class ProxyCacheMiddleware(object): def process_response(self, request, response): # Default response["Cache-Control"] =

    "no-cache" # Never cache non-GET if request.method.lower() not in ("get", "head"): return response # Determine age age = 0 for key in TIMEOUT_KEYS: if TIMEOUT[key].match(request.path_info): age = key break if age: response["Last-Modified"] = httpdate(datetime.datetime.utcnow() response["X-Accel-Expires"] = age response["Cache-Control"] = "max-age=%d" % max(age / 6, 30) response["Vary"] = "Accept-Encoding" else: response["Cache-Control"] = "no-cache" return response Pycon2016 talk 36 of 46
  28. Purging paths from the Nginx reverse cache Each node runs

    Nginx as web server and a reverse cache. The nodes have no knowledge of each other. Django has no knowledge of the nodes. So how do we broadcast the purge instruction? Pycon2016 talk 37 of 46
  29. RabbitMQ fanout! It's a type of exchange. Used for pub

    / sub implementations. Both Django and the Twisted services need only agree on the exchange name. Each node is controlled by Puppet / Salt and that sets the RabbitMQ address. Pycon2016 talk 38 of 46
  30. # settings.py ULTRACACHE = {"purge": {"method": "myapp.purgers.immediate"}} # purgers.py import

    pika def immediate(path): # Use same host as celery host, _ = settings.BROKER_URL.split("/")[2].split(":") connection = pika.BlockingConnection( pika.ConnectionParameters(host=host) ) channel = connection.channel() channel.exchange_declare(exchange="purgatory", type="fanout") channel.basic_publish( exchange="purgatory", routing_key="", body=path ) connection.close() return True Pycon2016 talk 39 of 46
  31. class MonitorService(service.Service): def connect(self): parameters = pika.ConnectionParameters() cc = protocol.ClientCreator(

    reactor, twisted_connection.TwistedProtocolConnection, paramete ) d = cc.connectTCP("localhost", 5672) d.addCallback(lambda protocol: protocol.ready) d.addCallback(self.setup_connection) return d @defer.inlineCallbacks def setup_connection(self, connection): self.channel = yield connection.channel() yield self.channel.exchange_declare( exchange="purgatory", type="fanout" ) result = yield self.channel.queue_declare(exclusive=True) queue_name = result.method.queue yield self.channel.queue_bind( exchange="purgatory", queue=queue_name ) yield self.channel.basic_qos(prefetch_count=1) self.queue_object, self.consumer_tag = \ yield self.channel.basic_consume( queue=queue_name, no_ack=False, exclusive=True ) Pycon2016 talk 40 of 46
  32. @defer.inlineCallbacks def startService(self): self.running = 1 yield self.connect() self.process_d =

    self.process_queue() @defer.inlineCallbacks def stopService(self): if not self.running: return yield self.channel.basic_cancel( callback=None, consumer_tag=self.consumer_tag ) self.queue_object.put(None) yield self.process_d self.running = 0 Pycon2016 talk 41 of 46
  33. @defer.inlineCallbacks def process_queue(self): while True: thing = yield self.queue_object.get() if

    thing is None: break ch, method, properties, body = thing if body: path = body try: response = yield treq.request( "PURGE", "http://127.0.0.1" + path, headers={"Host": "http://mysite.com"}, timeout=10 ) except (ConnectError, DNSLookupError, CancelledError, Respo pass else: content = yield response.content() yield ch.basic_ack(delivery_tag=method.delivery_tag) Pycon2016 talk 42 of 46
  34. What this gets us on Telkom.co.za About 80% reverse cache

    HIT ratio It's actually better because we serve expired pages when the cache updates 5-10% MISS ratio Memcache does a lot more traffic - 3MB per second read+write combined Memcache read:write ratio is 10:1 We can technically serve the site on very weak hardware Pycon2016 talk 43 of 46
  35. If you get an interview with us we'll give you

    a Raspberry Pi 3 Pycon2016 talk 45 of 46