Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Cache Strategies for Web Apps

Cache Strategies for Web Apps

Given at Zendcon, October 29, 2014

Glen Campbell

October 29, 2014
Tweet

More Decks by Glen Campbell

Other Decks in Programming

Transcript

  1. –Wikipedia “A web cache is a mechanism for the temporary

    storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.”
  2. REST • Client-server • Stateless • Cacheable • Layered system

    • Code on demand (optional) • Uniform interface
  3. Example: local HTTP/1.1 200 OK Date: Wed, 29 Oct 2014

    15:04:20 GMT Server: Apache/2.2.15 (CentOS) Last-Modified: Wed, 29 Oct 2014 14:54:23 GMT Accept-Ranges: bytes Content-Length: 212 Cache-Control: max-age=31536000 Expires: Thu, 29 Oct 2015 15:04:20 GMT Vary: Accept-Encoding Connection: close Content-Type: text/html; charset=UTF-8
  4. Example: hotel HTTP/1.0 200 OK Date: Wed, 29 Oct 2014

    15:05:32 GMT Server: Apache/2.2.15 (CentOS) Last-Modified: Wed, 29 Oct 2014 14:54:23 GMT Accept-Ranges: bytes Content-Length: 212 Cache-Control: max-age=31536000 Expires: Thu, 29 Oct 2015 15:05:32 GMT Vary: Accept-Encoding Content-Type: text/html; charset=UTF-8 X-Cache: MISS from localhost X-Cache-Lookup: MISS from localhost:3128 Via: 1.1 localhost:3128 (squid/2.7.STABLE3) Connection: close
  5. What changed? $ diff local hotel 1,2c1,2 < HTTP/1.1 200

    OK < Date: Wed, 29 Oct 2014 15:04:20 GMT --- > HTTP/1.0 200 OK > Date: Wed, 29 Oct 2014 15:05:32 GMT 8c8 < Expires: Thu, 29 Oct 2015 15:04:20 GMT --- > Expires: Thu, 29 Oct 2015 15:05:32 GMT 10d9 < Connection: close 11a11,14 > X-Cache: MISS from localhost > X-Cache-Lookup: MISS from localhost:3128 > Via: 1.1 localhost:3128 (squid/2.7.STABLE3) > Connection: close
  6. Relevant HTTP 1.2 Headers • Age: • Authorization: • Cache-Control:

    • Connection: • ETag: • Expires: • If-Match: • If-None-Match: • If-Range: • Pragma: • Vary: • Warning:
  7. Cache-Control: Responses • public • private • no-cache • no-store

    • no-transform • must-revalidate • proxy-revalidate • max-age={seconds} • s-maxage={seconds}
  8. Adding headers in PHP void header ( string $string [,

    
 bool $replace = true [, 
 int $http_response_code ]] )
  9. ETag: • Don’t use them • Generation is not specified

    by the HTTP standard, and is often not consistent across a cluster. • Error-prone and can be used to track users who refuse cookies. • Turn them off; don’t use them
  10. Expires: • Indicates when the resource is stale. • Specifies

    a date/time rather than delta seconds (Cache-Control: max-age=S) • Mostly used for compatibility with HTTP 1.0; Cache-Control: is more semantically rich.
  11. ?

  12. Is data cacheable? • Highly cacheable data: news stories, blog

    posts, aggregated data such as ratings or reviews (“likes”). • Uncacheable: secure, private, personal data such as user login information, credit card info, etc. Data that must change rapidly—stock quotes, for example, or health monitoring systems.
  13. Web Server Cache (Proxy) Service Web Server Web Server Cache

    (Proxy) Web Server ICP Example 3. Distributed Cache
  14. Web Server Cache (Proxy) Service Web Server Web Server Cache

    (Proxy) Web Server ICP (local cache) (local cache) (local cache) (local cache) Example 4. Local+Remote Cache
  15. Squid • Old, venerable; the reference implementation for the HTTP

    standard • Single-threaded • Can be tricky to configure (a multitude of options) but very high-performance • Implements ICP (Internet Cache Protocol) for distributed and hierarchical caches
  16. Varnish • More modern implementation than Squid; relies on virtual

    memory and multi-threaded access • Easier to set up and configure than squid • Does not support ICP or cache hierarchies
  17. nginx • reverse proxy and webserver - does not need

    a separate web server process • great for static content, according to users • uses asynchronous sockets; one process per core architecture
  18. DIY caching • Tools let you build your own cache

    system. • Not transparent, but can build transparency. • Most are simple key/value stores • Requires writing code
  19. DIY cache example • Object retrieval interface fetches data from

    service. • Internal methods query the data store (memcached, Redis) first and use stored data if possible. • If data is not in the cache, fetch it from the backend service and store it in the cache.
  20. Upsides for DIY caching • Provides a very clean programmatic

    interface (transparent at the application level) • Can be tailored to specific solutions where you understand the data. • Often very high performance
  21. Downsides to DIY caching • Requires code to be written,

    tested, etc. • Requires code maintenance if the underlying data model is changed. • Not standardized like HTTP for specifying age, freshness of data (i.e., not a generic solution, but a custom one)
  22. What is an “edge cache?” • A content delivery network

    (CDN) that holds static content on the “edges” of the Internet • Akamai is the biggest, but there are others: LimeLight, Microsoft Azure, Amazon CloudFront • Stores static content in multiple data centers • Content like JavaScript, CSS, images, and other media
  23. How does a CDN work? • Primary site (www.example.com) serves

    the HTML page. • <script> <style> <img> etc. tags reference static content on the CDN • User’s browsers loads (and often stores) the static content locally, because it’s served with a Cache-Control: max-age=32767 header.
  24. Q&A