HTTP for Great Good

Ce86d68173d477a17396b5e611468f52?s=47 Matt Robenolt
September 05, 2013

HTTP for Great Good

Scaling Django with HTTP

DjangoCon US 2013

http://www.youtube.com/watch?v=HAjOQ09I1UY

Ce86d68173d477a17396b5e611468f52?s=128

Matt Robenolt

September 05, 2013
Tweet

Transcript

  1. HTTP for Great Good Scaling Django with HTTP DjangoCon US

    September 5th 2013 Matt Robenolt
  2. Hello < me irl

  3. Site Reliability Engineer

  4. “DJANGO ALL THE THINGS!”

  5. “...but

  6. “...but

  7. The slowest part of a web application is typically not

    your code.
  8. Between databases and memcaches and Redises and Cassandras and MongoDBs

    and networks, Django is not the problem.
  9. “...everything

  10. None
  11. A few vanity metrics.

  12. Monthly Unique Visitors 1,115,080,411

  13. Monthly Page Views 7,516,761,301

  14. Inbound Traffic 42k total req/s 15k app req/s not my

    fault
  15. 36% of all requests actually hit a Django server 15k/42k

    = 36%
  16. ...what happened to the other 64%?

  17. Let’s talk about HTTP. Hypertext Transport Protocol

  18. $ curl -v disqus.com

  19. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > < HTTP/1.1 200 OK < Server: nginx < Date: Fri, 30 Aug 2013 06:38:37 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 10453 < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 06:38:36 GMT < Cache-Control: no-cache, must-revalidate
  20. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > < HTTP/1.1 200 OK < Server: nginx < Date: Fri, 30 Aug 2013 06:38:37 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 10453 < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 06:38:36 GMT < Cache-Control: no-cache, must-revalidate Request
  21. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > < HTTP/1.1 200 OK < Server: nginx < Date: Fri, 30 Aug 2013 06:38:37 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 10453 < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 06:38:36 GMT < Cache-Control: no-cache, must-revalidate Method
  22. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > < HTTP/1.1 200 OK < Server: nginx < Date: Fri, 30 Aug 2013 06:38:37 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 10453 < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 06:38:36 GMT < Cache-Control: no-cache, must-revalidate Path
  23. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > < HTTP/1.1 200 OK < Server: nginx < Date: Fri, 30 Aug 2013 06:38:37 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 10453 < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 06:38:36 GMT < Cache-Control: no-cache, must-revalidate Version
  24. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > < HTTP/1.1 200 OK < Server: nginx < Date: Fri, 30 Aug 2013 06:38:37 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 10453 < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 06:38:36 GMT < Cache-Control: no-cache, must-revalidate Headers
  25. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > < HTTP/1.1 200 OK < Server: nginx < Date: Fri, 30 Aug 2013 06:38:37 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 10453 < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 06:38:36 GMT < Cache-Control: no-cache, must-revalidate Response
  26. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > < HTTP/1.1 200 OK < Server: nginx < Date: Fri, 30 Aug 2013 06:38:37 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 10453 < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 06:38:36 GMT < Cache-Control: no-cache, must-revalidate Status
  27. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > < HTTP/1.1 200 OK < Server: nginx < Date: Fri, 30 Aug 2013 06:38:37 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 10453 < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 06:38:36 GMT < Cache-Control: no-cache, must-revalidate Headers
  28. Request in Django request.method # method request.get_full_path() # path request.META['HTTP_USER_AGENT']

    request.META['HTTP_ACCEPT']
  29. Response in Django response = HttpResponse(body) response.status_code = 200 response['X-Foo']

    = 'bar'
  30. “Cool,

  31. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > < HTTP/1.1 200 OK < Server: nginx < Date: Fri, 30 Aug 2013 06:38:37 GMT < Content-Type: text/html; charset=utf-8 < Content-Length: 10453 < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 06:38:36 GMT < Cache-Control: no-cache, must-revalidate Hmm. What can we do with this information?
  32. > GET / HTTP/1.1 > User-Agent: curl/7.24.0 > Host: disqus.com

    > Accept: */* > If-Modified-Since: Fri, 30 Aug 2013 00:32:14 GMT > < HTTP/1.1 304 Not Modified < Server: nginx < Date: Fri, 30 Aug 2013 18:05:38 GMT < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT < Vary: Accept-Encoding < Expires: Fri, 30 Aug 2013 18:05:37 GMT < Cache-Control: no-cache, must-revalidate
  33. 304 Not Modified No body is sent with the response

  34. 304 Not Modified Client reuses its cached version

  35. 304 Not Modified Usually more efficient to calculate

  36. notbad.gif

  37. But we can do better!

  38. Static files have been promoting good practices for a long

    time.
  39. “Far future headers”

  40. $ curl -v a.disquscdn.com/dotcom/d-6203c8f/css/ disqus-web/pages/home.css < HTTP/1.1 200 OK <

    Server: nginx < Content-Type: text/css; charset=utf-8 < Last-Modified: Fri, 16 Aug 2013 20:31:05 GMT < Expires: Sun, 15 Sep 2013 20:34:21 GMT < Cache-Control: max-age=2592000 < Content-Length: 30749 < Date: Sun, 18 Aug 2013 03:23:37 GMT < Via: 1.1 varnish < Age: 110956 < Connection: keep-alive < Vary: Accept-Encoding
  41. $ curl -v a.disquscdn.com/dotcom/d-6203c8f/css/ disqus-web/pages/home.css < HTTP/1.1 200 OK <

    Server: nginx < Content-Type: text/css; charset=utf-8 < Last-Modified: Fri, 16 Aug 2013 20:31:05 GMT < Expires: Sun, 15 Sep 2013 20:34:21 GMT < Cache-Control: max-age=2592000 < Content-Length: 30749 < Date: Sun, 18 Aug 2013 03:23:37 GMT < Via: 1.1 varnish < Age: 110956 < Connection: keep-alive < Vary: Accept-Encoding 30 days in the future.
  42. Chrome Web Inspector on second visit

  43. No HTTP request 0ms

  44. No HTTP request The client knew to just use its

    local cache
  45. No HTTP request Computer actually did something right for once

  46. “I SEE WHAT YOU DID THERE” - Hopefully you

  47. Takeaways Clients behave differently depending on the response headers

  48. Takeaways These usually come with minimal effort with static files

  49. Takeaways We can and should utilize these to our advantage

    to improve UX
  50. Same logic can be applied to dynamic content.

  51. What’s this look like in Django?

  52. Last-Modified def lol(request): response = render(request, 'lol.html') response['Last-Modified'] = \

    'Fri, 16 Aug 2013 20:31:05 GMT' return response * don’t do this.
  53. Last-Modified from django.views.decorators.http import \ last_modified def post_last_modified(request, slug): return

    Post.objects.get(slug=slug).modified @last_modified(post_last_modified) def blog_post_detail(request, slug): # Your view
  54. Cache-Control def lol(request): response = render(request, 'lol.html') response['Cache-Control'] = 'max-age=600'

    return response
  55. Cache-Control from django.views.decorators.cache import \ cache_control @cache_control(max_age=600) # Cache for

    10m def home(request): return HttpResponse('lol')
  56. “OMG!

  57. None
  58. Well... not really.

  59. How many requests will {{user}} make to the same page

    within 10 minutes?
  60. 1? 2? 3?

  61. ...out of 42k requests per second.

  62. Even with caching, your app is doing a lot of

    work.
  63. Parsing HTTP.

  64. WSGI.

  65. Django middleware stack.

  66. Do some stuff.

  67. Render a template?

  68. Back out through the Django middleware.

  69. Transform an HttpResponse into a real HTTP response.

  70. ...at 42k requests per second.

  71. You’re gonna have a bad time. me

  72. Until now, “client” has been a user’s browser.

  73. “If only we could utilize this Cache-Control stuff better...”

  74. Introducing

  75. $ apt-get install varnish

  76. $ brew install varnish

  77. tl;dr Varnish sits between Django and your users Internet

  78. tl;dr Caches HTTP responses and respects proper HTTP headers Internet

  79. tl;dr Its sole purpose in life is to be a

    cache, so it’s really fast. Internet
  80. Stand back, science is happening.

  81. Stand back, science is happening. benchmarking

  82. Simple, non-scientific “Hello World”

  83. from django.views.decorators.cache import \ cache_control from django.http import HttpResponse @cache_control(max_age=5)

    def hello(request): return HttpResponse('Hello world') “Hello World”
  84. $ httperf --server 127.0.0.1 --port 8000 -- uri /hello/ --rate

    150 --num-conn 10 --num-call 500 --hog Request rate: 369.6 req/s (2.7 ms/req) Django + gunicorn * on my MacBook Air
  85. $ httperf --server 127.0.0.1 --port 8888 -- uri /hello/ --rate

    150 --num-conn 10 --num-call 10000 --hog Request rate: 15633.4 req/s (0.1 ms/req) Varnish * on my MacBook Air
  86. Varnish: How does it work?

  87. First request

  88. First response “Lemme

  89. “Yo,

  90. Next response “wut

  91. Caching: ProMoves™

  92. Augment with JavaScript Update your UI optimistically

  93. Augment with JavaScript Leverage cookies to store non-critical data

  94. Augment with JavaScript Defer fetching user-specific data until needed

  95. Short TTLs are good Most things can be cached for

    at least 5s
  96. Short TTLs are good At 10k requests/s, a 5s TTL

    absorbs 49,999 requests
  97. Let’s meet: John and Jane Doe.

  98. John and Jane are different users.

  99. John logs into Disqus.

  100. Jane logs into Disqus.

  101. Jane sees John’s stuff.

  102. Jane sees John’s stuff. ^ not

  103. We really want to avoid this from ever happening.

  104. Cookies

  105. How do users even work?

  106. $ curl -vd "username=foo&password=bar" https:// disqus.com/profile/login/ > POST /profile/login/ HTTP/1.1

    > User-Agent: curl/7.24.0 > Host: disqus.com > < HTTP/1.1 302 FOUND < Server: nginx < Date: Fri, 30 Aug 2013 21:34:36 GMT < Vary: Cookie < Set-Cookie: sessionid=f7aa9598-11bb-11e3-9eb1-003048d9a288; Domain=.disqus.com; expires=Sun, 29-Sep-2013 21:34:36 GMT; httponly; Max-Age=2592000; Path=/
  107. $ curl -vd "username=foo&password=bar" https:// disqus.com/profile/login/ > POST /profile/login/ HTTP/1.1

    > User-Agent: curl/7.24.0 > Host: disqus.com > < HTTP/1.1 302 FOUND < Server: nginx < Date: Fri, 30 Aug 2013 21:34:36 GMT < Vary: Cookie < Set-Cookie: sessionid=f7aa9598-11bb-11e3-9eb1-003048d9a288; Domain=.disqus.com; expires=Sun, 29-Sep-2013 21:34:36 GMT; httponly; Max-Age=2592000; Path=/ Set-Cookie
  108. $ curl -vd "username=foo&password=bar" https:// disqus.com/profile/login/ > POST /profile/login/ HTTP/1.1

    > User-Agent: curl/7.24.0 > Host: disqus.com > < HTTP/1.1 302 FOUND < Server: nginx < Date: Fri, 30 Aug 2013 21:34:36 GMT < Vary: Cookie < Set-Cookie: sessionid=f7aa9598-11bb-11e3-9eb1-003048d9a288; Domain=.disqus.com; expires=Sun, 29-Sep-2013 21:34:36 GMT; httponly; Max-Age=2592000; Path=/ Session Id
  109. Unique id that represents a logged in user Session Id

  110. django.contrib.sessions / django.contrib.auth Session Id

  111. Could potentially cache per session id Session Id

  112. By default, Varnish will not cache any request with a

    Cookie header at all.
  113. Think about if your endpoint changes based on a user’s

    authentication.
  114. If it doesn’t, Varnish can normalize it.

  115. None
  116. Learn: Varnish Configuration Language (VCL) in 30 seconds

  117. sub vcl_recv { // These urls can be stripped of

    all // cookies since they serve the same // data for anon and auth'd user if ( req.url == "/" || req.url ~ "^/embed/comments/" ) { unset req.http.Cookie; } }
  118. Basically, caching is hard.

  119. Go make some stuff faster.

  120. We’re hiring people who hate computers. disqus.com/jobs

  121. Questions? I have answers. ^ github.com/mattrobenolt @mattrobenolt some