Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HTTP for Great Good

Matt Robenolt
September 05, 2013

HTTP for Great Good

Scaling Django with HTTP

DjangoCon US 2013

http://www.youtube.com/watch?v=HAjOQ09I1UY

Matt Robenolt

September 05, 2013
Tweet

More Decks by Matt Robenolt

Other Decks in Programming

Transcript

  1. HTTP for Great Good
    Scaling Django with HTTP
    DjangoCon US
    September 5th 2013
    Matt Robenolt

    View Slide

  2. Hello
    < me irl

    View Slide

  3. Site Reliability Engineer

    View Slide

  4. “DJANGO ALL THE THINGS!”

    View Slide

  5. “...but

    View Slide

  6. “...but

    View Slide

  7. The slowest part of a web
    application is typically not
    your code.

    View Slide

  8. Between databases and
    memcaches and Redises
    and Cassandras and
    MongoDBs and networks,
    Django is not the problem.

    View Slide

  9. “...everything

    View Slide

  10. View Slide

  11. A few vanity metrics.

    View Slide

  12. Monthly Unique Visitors
    1,115,080,411

    View Slide

  13. Monthly Page Views
    7,516,761,301

    View Slide

  14. Inbound Traffic
    42k total req/s
    15k app req/s
    not my fault

    View Slide

  15. 36% of all requests
    actually hit a Django server
    15k/42k = 36%

    View Slide

  16. ...what happened to the
    other 64%?

    View Slide

  17. Let’s talk about HTTP.
    Hypertext Transport Protocol

    View Slide

  18. $ curl -v disqus.com

    View Slide

  19. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Server: nginx
    < Date: Fri, 30 Aug 2013 06:38:37 GMT
    < Content-Type: text/html; charset=utf-8
    < Content-Length: 10453
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 06:38:36 GMT
    < Cache-Control: no-cache, must-revalidate

    View Slide

  20. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Server: nginx
    < Date: Fri, 30 Aug 2013 06:38:37 GMT
    < Content-Type: text/html; charset=utf-8
    < Content-Length: 10453
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 06:38:36 GMT
    < Cache-Control: no-cache, must-revalidate
    Request

    View Slide

  21. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Server: nginx
    < Date: Fri, 30 Aug 2013 06:38:37 GMT
    < Content-Type: text/html; charset=utf-8
    < Content-Length: 10453
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 06:38:36 GMT
    < Cache-Control: no-cache, must-revalidate
    Method

    View Slide

  22. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Server: nginx
    < Date: Fri, 30 Aug 2013 06:38:37 GMT
    < Content-Type: text/html; charset=utf-8
    < Content-Length: 10453
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 06:38:36 GMT
    < Cache-Control: no-cache, must-revalidate
    Path

    View Slide

  23. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Server: nginx
    < Date: Fri, 30 Aug 2013 06:38:37 GMT
    < Content-Type: text/html; charset=utf-8
    < Content-Length: 10453
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 06:38:36 GMT
    < Cache-Control: no-cache, must-revalidate
    Version

    View Slide

  24. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Server: nginx
    < Date: Fri, 30 Aug 2013 06:38:37 GMT
    < Content-Type: text/html; charset=utf-8
    < Content-Length: 10453
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 06:38:36 GMT
    < Cache-Control: no-cache, must-revalidate
    Headers

    View Slide

  25. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Server: nginx
    < Date: Fri, 30 Aug 2013 06:38:37 GMT
    < Content-Type: text/html; charset=utf-8
    < Content-Length: 10453
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 06:38:36 GMT
    < Cache-Control: no-cache, must-revalidate
    Response

    View Slide

  26. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Server: nginx
    < Date: Fri, 30 Aug 2013 06:38:37 GMT
    < Content-Type: text/html; charset=utf-8
    < Content-Length: 10453
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 06:38:36 GMT
    < Cache-Control: no-cache, must-revalidate
    Status

    View Slide

  27. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Server: nginx
    < Date: Fri, 30 Aug 2013 06:38:37 GMT
    < Content-Type: text/html; charset=utf-8
    < Content-Length: 10453
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 06:38:36 GMT
    < Cache-Control: no-cache, must-revalidate
    Headers

    View Slide

  28. Request in Django
    request.method # method
    request.get_full_path() # path
    request.META['HTTP_USER_AGENT']
    request.META['HTTP_ACCEPT']

    View Slide

  29. Response in Django
    response = HttpResponse(body)
    response.status_code = 200
    response['X-Foo'] = 'bar'

    View Slide

  30. “Cool,

    View Slide

  31. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    >
    < HTTP/1.1 200 OK
    < Server: nginx
    < Date: Fri, 30 Aug 2013 06:38:37 GMT
    < Content-Type: text/html; charset=utf-8
    < Content-Length: 10453
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 06:38:36 GMT
    < Cache-Control: no-cache, must-revalidate
    Hmm. What can we do with this
    information?

    View Slide

  32. > GET / HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    > Accept: */*
    > If-Modified-Since: Fri, 30 Aug 2013 00:32:14 GMT
    >
    < HTTP/1.1 304 Not Modified
    < Server: nginx
    < Date: Fri, 30 Aug 2013 18:05:38 GMT
    < Last-Modified: Fri, 30 Aug 2013 00:32:14 GMT
    < Vary: Accept-Encoding
    < Expires: Fri, 30 Aug 2013 18:05:37 GMT
    < Cache-Control: no-cache, must-revalidate

    View Slide

  33. 304 Not Modified
    No body is sent with the response

    View Slide

  34. 304 Not Modified
    Client reuses its cached version

    View Slide

  35. 304 Not Modified
    Usually more efficient to calculate

    View Slide

  36. notbad.gif

    View Slide

  37. But we can do better!

    View Slide

  38. Static files have been
    promoting good practices
    for a long time.

    View Slide

  39. “Far future headers”

    View Slide

  40. $ curl -v a.disquscdn.com/dotcom/d-6203c8f/css/
    disqus-web/pages/home.css
    < HTTP/1.1 200 OK
    < Server: nginx
    < Content-Type: text/css; charset=utf-8
    < Last-Modified: Fri, 16 Aug 2013 20:31:05 GMT
    < Expires: Sun, 15 Sep 2013 20:34:21 GMT
    < Cache-Control: max-age=2592000
    < Content-Length: 30749
    < Date: Sun, 18 Aug 2013 03:23:37 GMT
    < Via: 1.1 varnish
    < Age: 110956
    < Connection: keep-alive
    < Vary: Accept-Encoding

    View Slide

  41. $ curl -v a.disquscdn.com/dotcom/d-6203c8f/css/
    disqus-web/pages/home.css
    < HTTP/1.1 200 OK
    < Server: nginx
    < Content-Type: text/css; charset=utf-8
    < Last-Modified: Fri, 16 Aug 2013 20:31:05 GMT
    < Expires: Sun, 15 Sep 2013 20:34:21 GMT
    < Cache-Control: max-age=2592000
    < Content-Length: 30749
    < Date: Sun, 18 Aug 2013 03:23:37 GMT
    < Via: 1.1 varnish
    < Age: 110956
    < Connection: keep-alive
    < Vary: Accept-Encoding
    30 days in the future.

    View Slide

  42. Chrome Web Inspector
    on second visit

    View Slide

  43. No HTTP request
    0ms

    View Slide

  44. No HTTP request
    The client knew to just use its local cache

    View Slide

  45. No HTTP request
    Computer actually did something right for once

    View Slide

  46. “I SEE WHAT YOU DID THERE”
    - Hopefully you

    View Slide

  47. Takeaways
    Clients behave differently depending on the response headers

    View Slide

  48. Takeaways
    These usually come with minimal effort with static files

    View Slide

  49. Takeaways
    We can and should utilize these to our advantage to improve UX

    View Slide

  50. Same logic can be applied
    to dynamic content.

    View Slide

  51. What’s this look like in
    Django?

    View Slide

  52. Last-Modified
    def lol(request):
    response = render(request, 'lol.html')
    response['Last-Modified'] = \
    'Fri, 16 Aug 2013 20:31:05 GMT'
    return response
    * don’t do this.

    View Slide

  53. Last-Modified
    from django.views.decorators.http import \
    last_modified
    def post_last_modified(request, slug):
    return Post.objects.get(slug=slug).modified
    @last_modified(post_last_modified)
    def blog_post_detail(request, slug):
    # Your view

    View Slide

  54. Cache-Control
    def lol(request):
    response = render(request, 'lol.html')
    response['Cache-Control'] = 'max-age=600'
    return response

    View Slide

  55. Cache-Control
    from django.views.decorators.cache import \
    cache_control
    @cache_control(max_age=600) # Cache for 10m
    def home(request):
    return HttpResponse('lol')

    View Slide

  56. “OMG!

    View Slide

  57. View Slide

  58. Well... not really.

    View Slide

  59. How many requests will
    {{user}} make to the same
    page within 10 minutes?

    View Slide

  60. 1? 2? 3?

    View Slide

  61. ...out of 42k requests per
    second.

    View Slide

  62. Even with caching, your
    app is doing a lot of work.

    View Slide

  63. Parsing HTTP.

    View Slide

  64. WSGI.

    View Slide

  65. Django middleware stack.

    View Slide

  66. Do some stuff.

    View Slide

  67. Render a template?

    View Slide

  68. Back out through the
    Django middleware.

    View Slide

  69. Transform an
    HttpResponse into a real
    HTTP response.

    View Slide

  70. ...at 42k requests per
    second.

    View Slide

  71. You’re gonna have a bad time.
    me

    View Slide

  72. Until now, “client” has been
    a user’s browser.

    View Slide

  73. “If only we could utilize
    this Cache-Control stuff
    better...”

    View Slide

  74. Introducing

    View Slide

  75. $ apt-get install varnish

    View Slide

  76. $ brew install varnish

    View Slide

  77. tl;dr
    Varnish sits between Django and your users
    Internet

    View Slide

  78. tl;dr
    Caches HTTP responses and respects proper HTTP headers
    Internet

    View Slide

  79. tl;dr
    Its sole purpose in life is to be a cache, so it’s really fast.
    Internet

    View Slide

  80. Stand back, science is
    happening.

    View Slide

  81. Stand back, science is
    happening.
    benchmarking

    View Slide

  82. Simple, non-scientific
    “Hello World”

    View Slide

  83. from django.views.decorators.cache import \
    cache_control
    from django.http import HttpResponse
    @cache_control(max_age=5)
    def hello(request):
    return HttpResponse('Hello world')
    “Hello World”

    View Slide

  84. $ httperf --server 127.0.0.1 --port 8000 --
    uri /hello/ --rate 150 --num-conn 10 --num-call
    500 --hog
    Request rate: 369.6 req/s (2.7 ms/req)
    Django + gunicorn
    * on my MacBook Air

    View Slide

  85. $ httperf --server 127.0.0.1 --port 8888 --
    uri /hello/ --rate 150 --num-conn 10 --num-call
    10000 --hog
    Request rate: 15633.4 req/s (0.1 ms/req)
    Varnish
    * on my MacBook Air

    View Slide

  86. Varnish: How does it work?

    View Slide

  87. First request

    View Slide

  88. First response
    “Lemme

    View Slide

  89. “Yo,

    View Slide

  90. Next response
    “wut

    View Slide

  91. Caching: ProMoves™

    View Slide

  92. Augment with JavaScript
    Update your UI optimistically

    View Slide

  93. Augment with JavaScript
    Leverage cookies to store non-critical data

    View Slide

  94. Augment with JavaScript
    Defer fetching user-specific data until needed

    View Slide

  95. Short TTLs are good
    Most things can be cached for at least 5s

    View Slide

  96. Short TTLs are good
    At 10k requests/s, a 5s TTL absorbs 49,999 requests

    View Slide

  97. Let’s meet:
    John and Jane Doe.

    View Slide

  98. John and Jane
    are different users.

    View Slide

  99. John logs into Disqus.

    View Slide

  100. Jane logs into Disqus.

    View Slide

  101. Jane sees John’s stuff.

    View Slide

  102. Jane sees John’s stuff.
    ^
    not

    View Slide

  103. We really want to avoid
    this from ever happening.

    View Slide

  104. Cookies

    View Slide

  105. How do users even
    work?

    View Slide

  106. $ curl -vd "username=foo&password=bar" https://
    disqus.com/profile/login/
    > POST /profile/login/ HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    >
    < HTTP/1.1 302 FOUND
    < Server: nginx
    < Date: Fri, 30 Aug 2013 21:34:36 GMT
    < Vary: Cookie
    < Set-Cookie:
    sessionid=f7aa9598-11bb-11e3-9eb1-003048d9a288;
    Domain=.disqus.com; expires=Sun, 29-Sep-2013
    21:34:36 GMT; httponly; Max-Age=2592000; Path=/

    View Slide

  107. $ curl -vd "username=foo&password=bar" https://
    disqus.com/profile/login/
    > POST /profile/login/ HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    >
    < HTTP/1.1 302 FOUND
    < Server: nginx
    < Date: Fri, 30 Aug 2013 21:34:36 GMT
    < Vary: Cookie
    < Set-Cookie:
    sessionid=f7aa9598-11bb-11e3-9eb1-003048d9a288;
    Domain=.disqus.com; expires=Sun, 29-Sep-2013
    21:34:36 GMT; httponly; Max-Age=2592000; Path=/
    Set-Cookie

    View Slide

  108. $ curl -vd "username=foo&password=bar" https://
    disqus.com/profile/login/
    > POST /profile/login/ HTTP/1.1
    > User-Agent: curl/7.24.0
    > Host: disqus.com
    >
    < HTTP/1.1 302 FOUND
    < Server: nginx
    < Date: Fri, 30 Aug 2013 21:34:36 GMT
    < Vary: Cookie
    < Set-Cookie:
    sessionid=f7aa9598-11bb-11e3-9eb1-003048d9a288;
    Domain=.disqus.com; expires=Sun, 29-Sep-2013
    21:34:36 GMT; httponly; Max-Age=2592000; Path=/
    Session Id

    View Slide

  109. Unique id that represents a logged in user
    Session Id

    View Slide

  110. django.contrib.sessions / django.contrib.auth
    Session Id

    View Slide

  111. Could potentially cache per session id
    Session Id

    View Slide

  112. By default, Varnish will not
    cache any request with a
    Cookie header at all.

    View Slide

  113. Think about if your
    endpoint changes based on
    a user’s authentication.

    View Slide

  114. If it doesn’t, Varnish can
    normalize it.

    View Slide

  115. View Slide

  116. Learn: Varnish
    Configuration Language
    (VCL) in 30 seconds

    View Slide

  117. sub vcl_recv {
    // These urls can be stripped of all
    // cookies since they serve the same
    // data for anon and auth'd user
    if (
    req.url == "/" ||
    req.url ~ "^/embed/comments/"
    ) {
    unset req.http.Cookie;
    }
    }

    View Slide

  118. Basically, caching is hard.

    View Slide

  119. Go make some stuff faster.

    View Slide

  120. We’re hiring people who hate
    computers. disqus.com/jobs

    View Slide

  121. Questions? I have answers.
    ^
    github.com/mattrobenolt
    @mattrobenolt
    some

    View Slide