Scaling Django with HTTP
DjangoCon US 2013
http://www.youtube.com/watch?v=HAjOQ09I1UY
HTTP for Great GoodScaling Django with HTTPDjangoCon USSeptember 5th 2013Matt Robenolt
View Slide
Hello< me irl
Site Reliability Engineer
“DJANGO ALL THE THINGS!”
“...but
The slowest part of a webapplication is typically notyour code.
Between databases andmemcaches and Redisesand Cassandras andMongoDBs and networks,Django is not the problem.
“...everything
A few vanity metrics.
Monthly Unique Visitors1,115,080,411
Monthly Page Views7,516,761,301
Inbound Traffic42k total req/s15k app req/snot my fault
36% of all requestsactually hit a Django server15k/42k = 36%
...what happened to theother 64%?
Let’s talk about HTTP.Hypertext Transport Protocol
$ curl -v disqus.com
> GET / HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com> Accept: */*>< HTTP/1.1 200 OK< Server: nginx< Date: Fri, 30 Aug 2013 06:38:37 GMT< Content-Type: text/html; charset=utf-8< Content-Length: 10453< Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT< Vary: Accept-Encoding< Expires: Fri, 30 Aug 2013 06:38:36 GMT< Cache-Control: no-cache, must-revalidate
> GET / HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com> Accept: */*>< HTTP/1.1 200 OK< Server: nginx< Date: Fri, 30 Aug 2013 06:38:37 GMT< Content-Type: text/html; charset=utf-8< Content-Length: 10453< Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT< Vary: Accept-Encoding< Expires: Fri, 30 Aug 2013 06:38:36 GMT< Cache-Control: no-cache, must-revalidateRequest
> GET / HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com> Accept: */*>< HTTP/1.1 200 OK< Server: nginx< Date: Fri, 30 Aug 2013 06:38:37 GMT< Content-Type: text/html; charset=utf-8< Content-Length: 10453< Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT< Vary: Accept-Encoding< Expires: Fri, 30 Aug 2013 06:38:36 GMT< Cache-Control: no-cache, must-revalidateMethod
> GET / HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com> Accept: */*>< HTTP/1.1 200 OK< Server: nginx< Date: Fri, 30 Aug 2013 06:38:37 GMT< Content-Type: text/html; charset=utf-8< Content-Length: 10453< Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT< Vary: Accept-Encoding< Expires: Fri, 30 Aug 2013 06:38:36 GMT< Cache-Control: no-cache, must-revalidatePath
> GET / HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com> Accept: */*>< HTTP/1.1 200 OK< Server: nginx< Date: Fri, 30 Aug 2013 06:38:37 GMT< Content-Type: text/html; charset=utf-8< Content-Length: 10453< Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT< Vary: Accept-Encoding< Expires: Fri, 30 Aug 2013 06:38:36 GMT< Cache-Control: no-cache, must-revalidateVersion
> GET / HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com> Accept: */*>< HTTP/1.1 200 OK< Server: nginx< Date: Fri, 30 Aug 2013 06:38:37 GMT< Content-Type: text/html; charset=utf-8< Content-Length: 10453< Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT< Vary: Accept-Encoding< Expires: Fri, 30 Aug 2013 06:38:36 GMT< Cache-Control: no-cache, must-revalidateHeaders
> GET / HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com> Accept: */*>< HTTP/1.1 200 OK< Server: nginx< Date: Fri, 30 Aug 2013 06:38:37 GMT< Content-Type: text/html; charset=utf-8< Content-Length: 10453< Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT< Vary: Accept-Encoding< Expires: Fri, 30 Aug 2013 06:38:36 GMT< Cache-Control: no-cache, must-revalidateResponse
> GET / HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com> Accept: */*>< HTTP/1.1 200 OK< Server: nginx< Date: Fri, 30 Aug 2013 06:38:37 GMT< Content-Type: text/html; charset=utf-8< Content-Length: 10453< Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT< Vary: Accept-Encoding< Expires: Fri, 30 Aug 2013 06:38:36 GMT< Cache-Control: no-cache, must-revalidateStatus
Request in Djangorequest.method # methodrequest.get_full_path() # pathrequest.META['HTTP_USER_AGENT']request.META['HTTP_ACCEPT']
Response in Djangoresponse = HttpResponse(body)response.status_code = 200response['X-Foo'] = 'bar'
“Cool,
> GET / HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com> Accept: */*>< HTTP/1.1 200 OK< Server: nginx< Date: Fri, 30 Aug 2013 06:38:37 GMT< Content-Type: text/html; charset=utf-8< Content-Length: 10453< Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT< Vary: Accept-Encoding< Expires: Fri, 30 Aug 2013 06:38:36 GMT< Cache-Control: no-cache, must-revalidateHmm. What can we do with thisinformation?
> GET / HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com> Accept: */*> If-Modified-Since: Fri, 30 Aug 2013 00:32:14 GMT>< HTTP/1.1 304 Not Modified< Server: nginx< Date: Fri, 30 Aug 2013 18:05:38 GMT< Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT< Vary: Accept-Encoding< Expires: Fri, 30 Aug 2013 18:05:37 GMT< Cache-Control: no-cache, must-revalidate
304 Not ModifiedNo body is sent with the response
304 Not ModifiedClient reuses its cached version
304 Not ModifiedUsually more efficient to calculate
notbad.gif
But we can do better!
Static files have beenpromoting good practicesfor a long time.
“Far future headers”
$ curl -v a.disquscdn.com/dotcom/d-6203c8f/css/disqus-web/pages/home.css< HTTP/1.1 200 OK< Server: nginx< Content-Type: text/css; charset=utf-8< Last-Modified: Fri, 16 Aug 2013 20:31:05 GMT< Expires: Sun, 15 Sep 2013 20:34:21 GMT< Cache-Control: max-age=2592000< Content-Length: 30749< Date: Sun, 18 Aug 2013 03:23:37 GMT< Via: 1.1 varnish< Age: 110956< Connection: keep-alive< Vary: Accept-Encoding
$ curl -v a.disquscdn.com/dotcom/d-6203c8f/css/disqus-web/pages/home.css< HTTP/1.1 200 OK< Server: nginx< Content-Type: text/css; charset=utf-8< Last-Modified: Fri, 16 Aug 2013 20:31:05 GMT< Expires: Sun, 15 Sep 2013 20:34:21 GMT< Cache-Control: max-age=2592000< Content-Length: 30749< Date: Sun, 18 Aug 2013 03:23:37 GMT< Via: 1.1 varnish< Age: 110956< Connection: keep-alive< Vary: Accept-Encoding30 days in the future.
Chrome Web Inspectoron second visit
No HTTP request0ms
No HTTP requestThe client knew to just use its local cache
No HTTP requestComputer actually did something right for once
“I SEE WHAT YOU DID THERE”- Hopefully you
TakeawaysClients behave differently depending on the response headers
TakeawaysThese usually come with minimal effort with static files
TakeawaysWe can and should utilize these to our advantage to improve UX
Same logic can be appliedto dynamic content.
What’s this look like inDjango?
Last-Modifieddef lol(request):response = render(request, 'lol.html')response['Last-Modified'] = \'Fri, 16 Aug 2013 20:31:05 GMT'return response* don’t do this.
Last-Modifiedfrom django.views.decorators.http import \last_modifieddef post_last_modified(request, slug):return Post.objects.get(slug=slug).modified@last_modified(post_last_modified)def blog_post_detail(request, slug):# Your view
Cache-Controldef lol(request):response = render(request, 'lol.html')response['Cache-Control'] = 'max-age=600'return response
Cache-Controlfrom django.views.decorators.cache import \cache_control@cache_control(max_age=600) # Cache for 10mdef home(request):return HttpResponse('lol')
“OMG!
Well... not really.
How many requests will{{user}} make to the samepage within 10 minutes?
1? 2? 3?
...out of 42k requests persecond.
Even with caching, yourapp is doing a lot of work.
Parsing HTTP.
WSGI.
Django middleware stack.
Do some stuff.
Render a template?
Back out through theDjango middleware.
Transform anHttpResponse into a realHTTP response.
...at 42k requests persecond.
You’re gonna have a bad time.me
Until now, “client” has beena user’s browser.
“If only we could utilizethis Cache-Control stuffbetter...”
Introducing
$ apt-get install varnish
$ brew install varnish
tl;drVarnish sits between Django and your usersInternet
tl;drCaches HTTP responses and respects proper HTTP headersInternet
tl;drIts sole purpose in life is to be a cache, so it’s really fast.Internet
Stand back, science ishappening.
Stand back, science ishappening.benchmarking
Simple, non-scientific“Hello World”
from django.views.decorators.cache import \cache_controlfrom django.http import HttpResponse@cache_control(max_age=5)def hello(request):return HttpResponse('Hello world')“Hello World”
$ httperf --server 127.0.0.1 --port 8000 --uri /hello/ --rate 150 --num-conn 10 --num-call500 --hogRequest rate: 369.6 req/s (2.7 ms/req)Django + gunicorn* on my MacBook Air
$ httperf --server 127.0.0.1 --port 8888 --uri /hello/ --rate 150 --num-conn 10 --num-call10000 --hogRequest rate: 15633.4 req/s (0.1 ms/req)Varnish* on my MacBook Air
Varnish: How does it work?
First request
First response“Lemme
“Yo,
Next response“wut
Caching: ProMoves™
Augment with JavaScriptUpdate your UI optimistically
Augment with JavaScriptLeverage cookies to store non-critical data
Augment with JavaScriptDefer fetching user-specific data until needed
Short TTLs are goodMost things can be cached for at least 5s
Short TTLs are goodAt 10k requests/s, a 5s TTL absorbs 49,999 requests
Let’s meet:John and Jane Doe.
John and Janeare different users.
John logs into Disqus.
Jane logs into Disqus.
Jane sees John’s stuff.
Jane sees John’s stuff.^not
We really want to avoidthis from ever happening.
Cookies
How do users evenwork?
$ curl -vd "username=foo&password=bar" https://disqus.com/profile/login/> POST /profile/login/ HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com>< HTTP/1.1 302 FOUND< Server: nginx< Date: Fri, 30 Aug 2013 21:34:36 GMT< Vary: Cookie< Set-Cookie:sessionid=f7aa9598-11bb-11e3-9eb1-003048d9a288;Domain=.disqus.com; expires=Sun, 29-Sep-201321:34:36 GMT; httponly; Max-Age=2592000; Path=/
$ curl -vd "username=foo&password=bar" https://disqus.com/profile/login/> POST /profile/login/ HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com>< HTTP/1.1 302 FOUND< Server: nginx< Date: Fri, 30 Aug 2013 21:34:36 GMT< Vary: Cookie< Set-Cookie:sessionid=f7aa9598-11bb-11e3-9eb1-003048d9a288;Domain=.disqus.com; expires=Sun, 29-Sep-201321:34:36 GMT; httponly; Max-Age=2592000; Path=/Set-Cookie
$ curl -vd "username=foo&password=bar" https://disqus.com/profile/login/> POST /profile/login/ HTTP/1.1> User-Agent: curl/7.24.0> Host: disqus.com>< HTTP/1.1 302 FOUND< Server: nginx< Date: Fri, 30 Aug 2013 21:34:36 GMT< Vary: Cookie< Set-Cookie:sessionid=f7aa9598-11bb-11e3-9eb1-003048d9a288;Domain=.disqus.com; expires=Sun, 29-Sep-201321:34:36 GMT; httponly; Max-Age=2592000; Path=/Session Id
Unique id that represents a logged in userSession Id
django.contrib.sessions / django.contrib.authSession Id
Could potentially cache per session idSession Id
By default, Varnish will notcache any request with aCookie header at all.
Think about if yourendpoint changes based ona user’s authentication.
If it doesn’t, Varnish cannormalize it.
Learn: VarnishConfiguration Language(VCL) in 30 seconds
sub vcl_recv {// These urls can be stripped of all// cookies since they serve the same// data for anon and auth'd userif (req.url == "/" ||req.url ~ "^/embed/comments/") {unset req.http.Cookie;}}
Basically, caching is hard.
Go make some stuff faster.
We’re hiring people who hatecomputers. disqus.com/jobs
Questions? I have answers.^github.com/mattrobenolt@mattrobenoltsome