Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cache Strategies for Web Apps

Cache Strategies for Web Apps

Given at Zendcon, October 29, 2014

Glen Campbell

October 29, 2014
Tweet

More Decks by Glen Campbell

Other Decks in Programming

Transcript

  1. Cache strategies for
    web apps
    Glen Campbell @glenc

    View Slide

  2. Yes, I picked the dullest title ever

    View Slide

  3. View Slide

  4. –Wikipedia
    “A web cache is a mechanism for the
    temporary storage (caching) of web
    documents, such as HTML pages and
    images, to reduce bandwidth usage, server
    load, and perceived lag. A web cache stores
    copies of documents passing through it;
    subsequent requests may be satisfied from
    the cache if certain conditions are met.”

    View Slide

  5. What is the most common type of web
    cache?

    View Slide

  6. REST
    • Client-server
    • Stateless
    • Cacheable
    • Layered system
    • Code on demand (optional)
    • Uniform interface

    View Slide

  7. Example: local
    HTTP/1.1 200 OK
    Date: Wed, 29 Oct 2014 15:04:20 GMT
    Server: Apache/2.2.15 (CentOS)
    Last-Modified: Wed, 29 Oct 2014 14:54:23 GMT
    Accept-Ranges: bytes
    Content-Length: 212
    Cache-Control: max-age=31536000
    Expires: Thu, 29 Oct 2015 15:04:20 GMT
    Vary: Accept-Encoding
    Connection: close
    Content-Type: text/html; charset=UTF-8

    View Slide

  8. Example: hotel
    HTTP/1.0 200 OK
    Date: Wed, 29 Oct 2014 15:05:32 GMT
    Server: Apache/2.2.15 (CentOS)
    Last-Modified: Wed, 29 Oct 2014 14:54:23 GMT
    Accept-Ranges: bytes
    Content-Length: 212
    Cache-Control: max-age=31536000
    Expires: Thu, 29 Oct 2015 15:05:32 GMT
    Vary: Accept-Encoding
    Content-Type: text/html; charset=UTF-8
    X-Cache: MISS from localhost
    X-Cache-Lookup: MISS from localhost:3128
    Via: 1.1 localhost:3128 (squid/2.7.STABLE3)
    Connection: close

    View Slide

  9. What changed?
    $ diff local hotel
    1,2c1,2
    < HTTP/1.1 200 OK
    < Date: Wed, 29 Oct 2014 15:04:20 GMT
    ---
    > HTTP/1.0 200 OK
    > Date: Wed, 29 Oct 2014 15:05:32 GMT
    8c8
    < Expires: Thu, 29 Oct 2015 15:04:20 GMT
    ---
    > Expires: Thu, 29 Oct 2015 15:05:32 GMT
    10d9
    < Connection: close
    11a11,14
    > X-Cache: MISS from localhost
    > X-Cache-Lookup: MISS from localhost:3128
    > Via: 1.1 localhost:3128 (squid/2.7.STABLE3)
    > Connection: close

    View Slide

  10. HTTP 1.2

    View Slide

  11. Relevant HTTP 1.2 Headers
    • Age:
    • Authorization:
    • Cache-Control:
    • Connection:
    • ETag:
    • Expires:
    • If-Match:
    • If-None-Match:
    • If-Range:
    • Pragma:
    • Vary:
    • Warning:

    View Slide

  12. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html


    or


    http://bit.ly/1p0KHQr

    View Slide

  13. Cache-Control: Requests
    • no-cache
    • no-store
    • max-age={seconds}
    • max-stale={seconds}
    • min-fresh={seconds}
    • no-transform
    • only-if-cached

    View Slide

  14. Cache-Control: Responses
    • public
    • private
    • no-cache
    • no-store
    • no-transform
    • must-revalidate
    • proxy-revalidate
    • max-age={seconds}
    • s-maxage={seconds}

    View Slide

  15. Adding headers in PHP
    void header ( string $string [, 

    bool $replace = true [, 

    int $http_response_code ]] )

    View Slide

  16. Cache-Control: in PHP
    header(‘Cache-Control: no-cache’);
    header(‘Cache-Control: max-age=600’);

    View Slide

  17. ETag:
    • Don’t use them
    • Generation is not specified by the HTTP
    standard, and is often not consistent across a
    cluster.
    • Error-prone and can be used to track users who
    refuse cookies.
    • Turn them off; don’t use them

    View Slide

  18. Expires:
    • Indicates when the resource is stale.
    • Specifies a date/time rather than delta seconds
    (Cache-Control: max-age=S)
    • Mostly used for compatibility with HTTP 1.0;
    Cache-Control: is more semantically rich.

    View Slide

  19. Extensions
    • Cache-Control: max-age={s}, stale-while-
    revalidate={s}
    • Cache-Control: max-age={s}, stale-while-
    error={s}

    View Slide

  20. ?

    View Slide

  21. Is data cacheable?
    • Highly cacheable data: news stories, blog posts,
    aggregated data such as ratings or reviews
    (“likes”).
    • Uncacheable: secure, private, personal data
    such as user login information, credit card info,
    etc. Data that must change rapidly—stock
    quotes, for example, or health monitoring
    systems.

    View Slide

  22. Cache Architectures

    View Slide

  23. Web Server
    Service
    Example 1. No cache

    View Slide

  24. Web Server
    Cache (Proxy)
    Service
    Web Server
    Example 2. Shared Cache

    View Slide

  25. Web Server
    Cache (Proxy)
    Service
    Web Server Web Server
    Cache (Proxy)
    Web Server
    ICP
    Example 3. Distributed Cache

    View Slide

  26. Web Server
    Cache (Proxy)
    Service
    Web Server Web Server
    Cache (Proxy)
    Web Server
    ICP
    (local cache) (local cache)
    (local cache)
    (local cache)
    Example 4. Local+Remote Cache

    View Slide

  27. HTTP Proxies

    View Slide

  28. Squid
    • Old, venerable; the reference implementation for
    the HTTP standard
    • Single-threaded
    • Can be tricky to configure (a multitude of
    options) but very high-performance
    • Implements ICP (Internet Cache Protocol) for
    distributed and hierarchical caches

    View Slide

  29. Varnish
    • More modern implementation than Squid; relies
    on virtual memory and multi-threaded access
    • Easier to set up and configure than squid
    • Does not support ICP or cache hierarchies

    View Slide

  30. nginx
    • reverse proxy and webserver - does not need a
    separate web server process
    • great for static content, according to users
    • uses asynchronous sockets; one process per
    core architecture

    View Slide

  31. Manual Caching

    View Slide

  32. DIY caching
    • Tools let you build your own cache system.
    • Not transparent, but can build transparency.
    • Most are simple key/value stores
    • Requires writing code

    View Slide

  33. DIY cache example
    • Object retrieval interface fetches data from
    service.
    • Internal methods query the data store
    (memcached, Redis) first and use stored data if
    possible.
    • If data is not in the cache, fetch it from the
    backend service and store it in the cache.

    View Slide

  34. Upsides for DIY caching
    • Provides a very clean programmatic interface
    (transparent at the application level)
    • Can be tailored to specific solutions where you
    understand the data.
    • Often very high performance

    View Slide

  35. Downsides to DIY caching
    • Requires code to be written, tested, etc.
    • Requires code maintenance if the underlying
    data model is changed.
    • Not standardized like HTTP for specifying age,
    freshness of data (i.e., not a generic solution, but
    a custom one)

    View Slide

  36. Edge Caching

    View Slide

  37. What is an “edge cache?”
    • A content delivery network (CDN) that holds
    static content on the “edges” of the Internet
    • Akamai is the biggest, but there are others:
    LimeLight, Microsoft Azure, Amazon CloudFront
    • Stores static content in multiple data centers
    • Content like JavaScript, CSS, images, and other
    media

    View Slide

  38. How does a CDN work?
    • Primary site (www.example.com) serves the
    HTML page.
    • <style> <img> etc. tags reference<br/>static content on the CDN<br/>• User’s browsers loads (and often stores) the<br/>static content locally, because it’s served with a<br/>Cache-Control: max-age=32767 header.<br/>

    View Slide

  39. Q&A

    View Slide

  40. [email protected]
    @glenc

    http://www.glencampbell.co
    http://developer.rackspace.com


    Free Cloud!

    View Slide