Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Django Doesn't Scale

Django Doesn't Scale

(and what you can do about it.)

Given at OSCON 2012.

Jacob Kaplan-Moss

July 19, 2012
Tweet

More Decks by Jacob Kaplan-Moss

Other Decks in Technology

Transcript

  1. 2

  2. 6 DjangoCon 2008 70 Cal Henderson, Why I hate Django

    http://www.youtube.com/watch?v=i6Fr65PFqfk
  3. 10 front-end rendering time database load resource contention ddos attack

    network latency bugs in your software bugs in third party software slow template rendering misconfigured servers not enough threads “Our site’s too slow!”
  4. Data collection 11 •logging http://docs.python.org/library/logging •Sentry http://sentry.readthedocs.org/ •python-statsd http://packages.python.org/python-statsd/ •mmstats

    http://mmstats.readthedocs.org/ https://github.com/schmichael/django-mmstats/ •Metrology http://metrology.readthedocs.org
  5. import time from metrology import Metrology http_ok = Metrology.counter('http.ok') http_err

    = Metrology.counter('http.err') response_time = Metrology.histogram('request.time') class RequestMetricsMiddleware(object): def process_request(self, request): request._start_time = time.time() def process_response(self, request, response): response_time.update(time.time() - request._start_time) if 200 <= response.status_code < 400: http_ok.increment() else: http_err.increment() return response def process_exception(self, request, exception): http_err.increment() 12
  6. 14

  7. Two-phased rendering <div id="header"> {% load phased_tags %} {% phased

    with user %} Hello, {{ user.name }} {% endphased %} </div> 24
  8. “ ” There are only two hard things in computer

    science: cache invalidation, naming things, and off-by-one errors. 26
  9. 27 from books.models import Book from django.shortcuts import render def

    book_list(request): qs = Book.objects.all() return render(request, 'books.html', {'books': qs})
  10. 28 from books.models import Book from django.shortcuts import render from

    django.views.decorators.cache import cache_page @cache_page(600) def book_list(request): qs = Book.objects.all() return render(request, 'books.html', {'books': qs})
  11. “ ” There are only two hard things in computer

    science: cache invalidation, naming things, and off-by-one errors. 30
  12. 32 Cache cache outdated? View Regenerate cache data key:arg:arg  

         -­‐>  (ttl,  data) username:jacob        -­‐>  (1342710010,  "JKM") Data changed
  13. 39 Rule of thumb: Writes are 10x as expensive as

    reads* * This isn’t actually true, but you should pretend it is anyway.
  14. 40 @task def mark_received(message_ids): Message.objects.filter(id__in=message_ids) \ .update(received=True) def message_list(request): messages

    = Message.objects.filter(recipient=request.user) mark_received.delay(messages.values_list('id', flat=True)) return render(...)
  15. ORM inefficiencies •Queryset cloning Copying a queryset requires cloning a

    heavy structure. Model.objects.filter(...).filter(...).order_by(...) •Model instantiation ~40k __init__s per second (see http://bit.ly/Muepgo). •Saving models Readable, but slow: m.foo  =  bar;  m.save() Much much faster: Model.objects.filter(id=instance.id).update(foo=bar) 48
  16. Bulk inserts • Horrifically slow: for  title  in  big_title_list:  

         Book.objects.create(title=title) • Faster, but still terrible: with  transaction.commit_on_success():        for  title  in  big_title_list:                Book.objects.create(title=title) • Fast: bl  =  [Book(title=t)  for  t  in  big_title_list] Book.objects.bulk_create(bl) • But COPY  FROM wins, hands down. http://www.postgresql.org/docs/9.1/static/sql-copy.html http://initd.org/psycopg/docs/usage.html#using-copy-to-and-copy-from 49
  17. Modern databases are incredible. 53 See, e.g. Schemaless SQL, Craig

    Kerstiens: http://klewel.com/conferences/djangocon-2012/index.php?talkID=29