Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Demystifying web cache by Kristian Lyngstøl

Demystifying web cache by Kristian Lyngstøl

This talk will discuss caching from app server to web browser. Subjects like s-maxage vs. max-age, little known obscurities around the Vary header and more will be covered. The talk focuses on using simple, safe strategies that don't lead to information leakage even when things go wrong. There will be some Varnish-specific tips and tricks.

Kristian has been breaking things and fixing them again for most of his life. He's a C, Java, AWK, perl, python and misc programmer and spends his day working with Varnish Cache. He wrote most of the Varnish Book used for professional Varnish training because someone had to do it.

OWASP Montreal Hangout - February 28th

Full video of the presentation :http://www.youtube.com/watch?v=5Sy7n5J7b1U

https://www.owasp.org/index.php/Montr%C3%A9al

OWASP Montréal

February 28, 2013
Tweet

More Decks by OWASP Montréal

Other Decks in Technology

Transcript

  1. View Slide

  2. Demystifying cache
    Kristian Lyngstøl
    Product Specialist
    Varnish Software AS
    Montreal, March 2013

    View Slide

  3. Agenda
    - The types of caches involved
    - The benefits of a cache
    - HTTP
    - Reverse proxy specifics

    View Slide

  4. Not: L1/L2 cache. Disk cache, etc.

    View Slide

  5. Browser cache
    Intermediary caches
    Reverse proxy

    View Slide

  6. Browser
    Intermediary cache
    Reverse proxy
    Origin server
    Outside your direct
    control
    Within your
    direct control
    The Internet

    View Slide

  7. Pro: Reduced server load
    Con: Increased complexity

    View Slide

  8. Pro: Increased robustness
    Con: Increased complexity == harder problems

    View Slide

  9. Pro: Easier scaling
    Con: Complex architecture

    View Slide

  10. Pro: Reduced latency by geographic distribution
    Con: Increased latency on cache misses

    View Slide

  11. HTTP: Written for cache.

    View Slide

  12. HTTP
    The original REST interface
    Addresses browser cache, intermediary caches but
    NOT reverse proxies.

    View Slide

  13. Vary:
    Browser: GET /foo, Accept-Encoding: gzip
    Server: Here's /foo. It's compressed. Vary:
    Accept-encoding.
    Vary is a way for a server to signal that there
    might be different variants of the content
    depending on a specific header.

    View Slide

  14. Vary: examples
    - Vary: Accept-Encoding Compression.
    - Vary: User-Agent Mobile content?
    - Vary: Cookie
    - Vary: Accept-Language
    - All of the above
    Most common by far: Vary: Accept-Encoding

    View Slide

  15. Vary-challenges
    Content is provided either in gzip or
    uncompressed form, yet “Accept-Encoding”
    can contain any number of potential
    algorithms.
    Content varies depending on a specific cookie,
    but no way to tell which by using just Vary.
    (all or nothing).

    View Slide

  16. “Accept-Encoding: gzip, deflate”
    “Accept-Encoding: deflate, gzip”
    Two different things!

    View Slide

  17. Without a cache, vary has almost no side effects.
    Thus often ignored.

    View Slide

  18. What we should do:
    Vary: Cookie, User-Agent

    View Slide

  19. Technique number 0: FAIL SAFE
    You will fail at some point. Make sure you fail in an
    acceptable manner.
    Is it better to disable a cache or artificially inflate the
    cache than to deliver user-specific content to the
    wrong user?

    View Slide

  20. Hypothetical example
    - Site allows students to register for a summer
    event.
    - Last minute change.
    - At launch, the cache is forced to cache
    content with “Set-Cookie” headers present.
    - Students end up taking over each others'
    sessions
    - Lawsuit to distribute blame lasts for years.

    View Slide

  21. The by far most common cache mistake is ignoring
    cookies.

    View Slide

  22. “If content has cookies, clean out all but THESE
    cookies”
    “If content still has cookies, either DO NOT cache,
    or add the entire cookie string to the hash key,
    then cache.”

    View Slide

  23. Never. Ever. Cache. Set-Cookie. (unless, of course,
    you have a good reason!)

    View Slide

  24. Example: Kenneth (36)
    Kenneth (36) became famous when his tax returns
    were incorrectly cached and thousands of users got
    to see his tax returns instead of their own....

    View Slide

  25. Some would argue that the site going down might
    have been better.

    View Slide

  26. Cache-Control header

    View Slide

  27. max-age: For browsers, also used by intermediary
    caches.

    View Slide

  28. s-maxage: For caches, used by both intermediary
    caches AND reverse proxies.
    Tip: Use it for reverse proxies, then remove it before
    exposing it to intermediary caches.

    View Slide

  29. must-revalidate: You can cache, but revalidate
    before using it.
    private: Only browser cache. No shared cache.
    public: Can be cached.
    no-cache: Similar to must-revalidate.

    View Slide

  30. Emerging: Surrogate-Control, a “Cache-Control” for
    surrogate caches (aka: reverse proxies) only.

    View Slide

  31. Conditional requests
    “Give me /pictures/cats, but only if it's newer than
    X”
    If-Modified-Since:

    View Slide

  32. Conditional requests
    “Give me /pictures/cats, but only if it doesn't match
    Etag FOO.”
    If-None-Match: Foo

    View Slide

  33. ETags
    Essentially a Unique ID for an
    asset/resource/url.
    Not only useful for caches.
    “UPDATE this resource, but only if the old
    version matches what I had.”

    View Slide

  34. GET /foobar HTTP/1.1
    ETag: FOO-version1
    (time passes)
    DELETE /foobar HTTP/1.1
    If-Match: FOO-version1

    View Slide

  35. Handling high loads == handling bugs, DoS attacks
    and more.

    View Slide

  36. Example: Counting comments
    20-ish news sites, each with comment section. To
    display “X comments on this article” on the
    frontpage, a resource contained a list of all articles
    and the counters.
    “This updates constantly, impossible to cache!”

    View Slide

  37. What about Expires and Pragma?
    Pragma: Not defined anywhere. Do not use.
    Expires: Troublesome at best. Usually set to
    some developer's birthday or that 1997-date
    that seems to have originated from an
    example on php.net.

    View Slide

  38. Expires: Thu, 19 Nov 1981 08:52:00 GMT
    Expires: Sun, 19 Nov 1978 05:00:00 GMT
    Expires: Sat, 26 Jul 1997 05:00:00 GMT
    Expires: Mon, 26 Jul 1997 05:00:00 GMT
    (Mon, 26, Jul 1997 does not exist)
    Common Expires values

    View Slide

  39. Contact information
    Kristian Lyngstøl
    [email protected]
    @kristianlyng
    http://kly.no

    View Slide