Demystifying web cache by Kristian Lyngstøl

Demystifying web cache by Kristian Lyngstøl

This talk will discuss caching from app server to web browser. Subjects like s-maxage vs. max-age, little known obscurities around the Vary header and more will be covered. The talk focuses on using simple, safe strategies that don't lead to information leakage even when things go wrong. There will be some Varnish-specific tips and tricks.

Kristian has been breaking things and fixing them again for most of his life. He's a C, Java, AWK, perl, python and misc programmer and spends his day working with Varnish Cache. He wrote most of the Varnish Book used for professional Varnish training because someone had to do it.

OWASP Montreal Hangout - February 28th

Full video of the presentation :http://www.youtube.com/watch?v=5Sy7n5J7b1U

https://www.owasp.org/index.php/Montr%C3%A9al

09905cce02942fb076f958f4b69fd8f6?s=128

OWASP Montréal

February 28, 2013
Tweet

Transcript

  1. None
  2. Demystifying cache Kristian Lyngstøl Product Specialist Varnish Software AS Montreal,

    March 2013
  3. Agenda - The types of caches involved - The benefits

    of a cache - HTTP - Reverse proxy specifics
  4. Not: L1/L2 cache. Disk cache, etc.

  5. Browser cache Intermediary caches Reverse proxy

  6. Browser Intermediary cache Reverse proxy Origin server Outside your direct

    control Within your direct control The Internet
  7. Pro: Reduced server load Con: Increased complexity

  8. Pro: Increased robustness Con: Increased complexity == harder problems

  9. Pro: Easier scaling Con: Complex architecture

  10. Pro: Reduced latency by geographic distribution Con: Increased latency on

    cache misses
  11. HTTP: Written for cache.

  12. HTTP The original REST interface Addresses browser cache, intermediary caches

    but NOT reverse proxies.
  13. Vary: Browser: GET /foo, Accept-Encoding: gzip Server: Here's /foo. It's

    compressed. Vary: Accept-encoding. Vary is a way for a server to signal that there might be different variants of the content depending on a specific header.
  14. Vary: examples - Vary: Accept-Encoding Compression. - Vary: User-Agent Mobile

    content? - Vary: Cookie - Vary: Accept-Language - All of the above Most common by far: Vary: Accept-Encoding
  15. Vary-challenges Content is provided either in gzip or uncompressed form,

    yet “Accept-Encoding” can contain any number of potential algorithms. Content varies depending on a specific cookie, but no way to tell which by using just Vary. (all or nothing).
  16. “Accept-Encoding: gzip, deflate” “Accept-Encoding: deflate, gzip” Two different things!

  17. Without a cache, vary has almost no side effects. Thus

    often ignored.
  18. What we should do: Vary: Cookie, User-Agent

  19. Technique number 0: FAIL SAFE You will fail at some

    point. Make sure you fail in an acceptable manner. Is it better to disable a cache or artificially inflate the cache than to deliver user-specific content to the wrong user?
  20. Hypothetical example - Site allows students to register for a

    summer event. - Last minute change. - At launch, the cache is forced to cache content with “Set-Cookie” headers present. - Students end up taking over each others' sessions - Lawsuit to distribute blame lasts for years.
  21. The by far most common cache mistake is ignoring cookies.

  22. “If content has cookies, clean out all but THESE cookies”

    “If content still has cookies, either DO NOT cache, or add the entire cookie string to the hash key, then cache.”
  23. Never. Ever. Cache. Set-Cookie. (unless, of course, you have a

    good reason!)
  24. Example: Kenneth (36) Kenneth (36) became famous when his tax

    returns were incorrectly cached and thousands of users got to see his tax returns instead of their own....
  25. Some would argue that the site going down might have

    been better.
  26. Cache-Control header

  27. max-age: For browsers, also used by intermediary caches.

  28. s-maxage: For caches, used by both intermediary caches AND reverse

    proxies. Tip: Use it for reverse proxies, then remove it before exposing it to intermediary caches.
  29. must-revalidate: You can cache, but revalidate before using it. private:

    Only browser cache. No shared cache. public: Can be cached. no-cache: Similar to must-revalidate.
  30. Emerging: Surrogate-Control, a “Cache-Control” for surrogate caches (aka: reverse proxies)

    only.
  31. Conditional requests “Give me /pictures/cats, but only if it's newer

    than X” If-Modified-Since:
  32. Conditional requests “Give me /pictures/cats, but only if it doesn't

    match Etag FOO.” If-None-Match: Foo
  33. ETags Essentially a Unique ID for an asset/resource/url. Not only

    useful for caches. “UPDATE this resource, but only if the old version matches what I had.”
  34. GET /foobar HTTP/1.1 ETag: FOO-version1 (time passes) DELETE /foobar HTTP/1.1

    If-Match: FOO-version1
  35. Handling high loads == handling bugs, DoS attacks and more.

  36. Example: Counting comments 20-ish news sites, each with comment section.

    To display “X comments on this article” on the frontpage, a resource contained a list of all articles and the counters. “This updates constantly, impossible to cache!”
  37. What about Expires and Pragma? Pragma: Not defined anywhere. Do

    not use. Expires: Troublesome at best. Usually set to some developer's birthday or that 1997-date that seems to have originated from an example on php.net.
  38. Expires: Thu, 19 Nov 1981 08:52:00 GMT Expires: Sun, 19

    Nov 1978 05:00:00 GMT Expires: Sat, 26 Jul 1997 05:00:00 GMT Expires: Mon, 26 Jul 1997 05:00:00 GMT (Mon, 26, Jul 1997 does not exist) Common Expires values
  39. Contact information Kristian Lyngstøl kristian@bohemians.org @kristianlyng http://kly.no