Slide 1

Slide 1 text

Django Doesn’t Scale! (and what you can do about it) Jacob Kaplan-Moss [email protected] 1

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

Rails doesn’t scale. 3

Slide 4

Slide 4 text

Django doesn’t scale. 4

Slide 5

Slide 5 text

Frameworks don’t scale! 5

Slide 6

Slide 6 text

6 DjangoCon 2008 70 Cal Henderson, Why I hate Django http://www.youtube.com/watch?v=i6Fr65PFqfk

Slide 7

Slide 7 text

7 Cal Henderson, Why I hate Django http://www.youtube.com/watch?v=i6Fr65PFqfk DjangoCon 2008 71

Slide 8

Slide 8 text

8 Cal Henderson, Why I hate Django http://www.youtube.com/watch?v=i6Fr65PFqfk DjangoCon 2008 72

Slide 9

Slide 9 text

9 “Measure twice, cut once.”

Slide 10

Slide 10 text

10 front-end rendering time database load resource contention ddos attack network latency bugs in your software bugs in third party software slow template rendering misconfigured servers not enough threads “Our site’s too slow!”

Slide 11

Slide 11 text

Data collection 11 •logging http://docs.python.org/library/logging •Sentry http://sentry.readthedocs.org/ •python-statsd http://packages.python.org/python-statsd/ •mmstats http://mmstats.readthedocs.org/ https://github.com/schmichael/django-mmstats/ •Metrology http://metrology.readthedocs.org

Slide 12

Slide 12 text

import time from metrology import Metrology http_ok = Metrology.counter('http.ok') http_err = Metrology.counter('http.err') response_time = Metrology.histogram('request.time') class RequestMetricsMiddleware(object): def process_request(self, request): request._start_time = time.time() def process_response(self, request, response): response_time.update(time.time() - request._start_time) if 200 <= response.status_code < 400: http_ok.increment() else: http_err.increment() return response def process_exception(self, request, exception): http_err.increment() 12

Slide 13

Slide 13 text

Graphite http://graphite.readthedocs.org/ 13 http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/

Slide 14

Slide 14 text

14

Slide 15

Slide 15 text

Be lazy. 15

Slide 16

Slide 16 text

Required viewing: “Cache rules everything around me” Jacob Burch, Noah Silas http://pyvideo.org/video/679 16

Slide 17

Slide 17 text

17 “But my social pinstafacegramiqus isn’t cachable!

Slide 18

Slide 18 text

18 cacheable cacheable

Slide 19

Slide 19 text

19 not cacheable not cacheable

Slide 20

Slide 20 text

20
Hello, {{ user.name }}

Slide 21

Slide 21 text

21 Technique: resource decomposition.

Slide 22

Slide 22 text

•Edge-side includes http://en.wikipedia.org/wiki/Edge_Side_Includes https://github.com/mrfunyon/django-esi •Two-phased template rendering http://www.holovaty.com/writing/django-two-phased-rendering/ http://django-phased.readthedocs.org/ •Client-side composition 22 Decomposition options

Slide 23

Slide 23 text

23
Edge-side includes

Slide 24

Slide 24 text

Two-phased rendering
{% load phased_tags %} {% phased with user %} Hello, {{ user.name }} {% endphased %}
24

Slide 25

Slide 25 text

Client-side composition
Hello
$(function() { $.get("/user/username", function(data) { $("#header").html(data); }); }); 25

Slide 26

Slide 26 text

“ ” There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors. 26

Slide 27

Slide 27 text

27 from books.models import Book from django.shortcuts import render def book_list(request): qs = Book.objects.all() return render(request, 'books.html', {'books': qs})

Slide 28

Slide 28 text

28 from books.models import Book from django.shortcuts import render from django.views.decorators.cache import cache_page @cache_page(600) def book_list(request): qs = Book.objects.all() return render(request, 'books.html', {'books': qs})

Slide 29

Slide 29 text

from books.models import book from django.core.cache import cache def create_book(): Book.objects.create(...) cache.delete("U MAD?") 29

Slide 30

Slide 30 text

“ ” There are only two hard things in computer science: cache invalidation, naming things, and off-by-one errors. 30

Slide 31

Slide 31 text

Technique: serve everything from cache. 31

Slide 32

Slide 32 text

32 Cache cache outdated? View Regenerate cache data key:arg:arg        -­‐>  (ttl,  data) username:jacob        -­‐>  (1342710010,  "JKM") Data changed

Slide 33

Slide 33 text

Procrastinate. 33

Slide 34

Slide 34 text

34 Cache cache outdated? View Regenerate cache data Hang on, this might take a while...

Slide 35

Slide 35 text

“Eventual consistency” 35

Slide 36

Slide 36 text

“Perceived performance” 36

Slide 37

Slide 37 text

37 http://celeryproject.org/

Slide 38

Slide 38 text

38 def message_list(request): messages = Message.objects.filter(recipient=request.user) messages.update(received=True) return render(...)

Slide 39

Slide 39 text

39 Rule of thumb: Writes are 10x as expensive as reads* * This isn’t actually true, but you should pretend it is anyway.

Slide 40

Slide 40 text

40 @task def mark_received(message_ids): Message.objects.filter(id__in=message_ids) \ .update(received=True) def message_list(request): messages = Message.objects.filter(recipient=request.user) mark_received.delay(messages.values_list('id', flat=True)) return render(...)

Slide 41

Slide 41 text

Watch that query count! 41

Slide 42

Slide 42 text

If you can only monitor one metric, make it query count. 42

Slide 43

Slide 43 text

43 request response query query query

Slide 44

Slide 44 text

44 network latency query time network latency

Slide 45

Slide 45 text

45 Your new BFFs: http://django.me/select_related http://django.me/prefetch_related https://django.me/raw

Slide 46

Slide 46 text

Don’t fear the DB. 46

Slide 47

Slide 47 text

“ORM is the Vietnam of Computer Science” 47 http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx

Slide 48

Slide 48 text

ORM inefficiencies •Queryset cloning Copying a queryset requires cloning a heavy structure. Model.objects.filter(...).filter(...).order_by(...) •Model instantiation ~40k __init__s per second (see http://bit.ly/Muepgo). •Saving models Readable, but slow: m.foo  =  bar;  m.save() Much much faster: Model.objects.filter(id=instance.id).update(foo=bar) 48

Slide 49

Slide 49 text

Bulk inserts • Horrifically slow: for  title  in  big_title_list:        Book.objects.create(title=title) • Faster, but still terrible: with  transaction.commit_on_success():        for  title  in  big_title_list:                Book.objects.create(title=title) • Fast: bl  =  [Book(title=t)  for  t  in  big_title_list] Book.objects.bulk_create(bl) • But COPY  FROM wins, hands down. http://www.postgresql.org/docs/9.1/static/sql-copy.html http://initd.org/psycopg/docs/usage.html#using-copy-to-and-copy-from 49

Slide 50

Slide 50 text

Django needs to be database-agnostic. You don’t. 50

Slide 51

Slide 51 text

Technique: managers and raw queries 51

Slide 52

Slide 52 text

52 class BookManager(models.Manager): def inventory(self): return self.raw("... BIG-ASS QUERY ...")

Slide 53

Slide 53 text

Modern databases are incredible. 53 See, e.g. Schemaless SQL, Craig Kerstiens: http://klewel.com/conferences/djangocon-2012/index.php?talkID=29

Slide 54

Slide 54 text

54 1. Measure everything. 2. Cache carefully. 3. Procrastinate. 4. Count queries. 5. Hug your DBA.

Slide 55

Slide 55 text

Thanks! [email protected] http://lanyrd.com/swqxb Office hour: 1:40pm, Expo Hall 55