Slide 1

Slide 1 text

Cache strategies for web apps Glen Campbell @glenc

Slide 2

Slide 2 text

Yes, I picked the dullest title ever

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

–Wikipedia “A web cache is a mechanism for the temporary storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. A web cache stores copies of documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.”

Slide 5

Slide 5 text

What is the most common type of web cache?

Slide 6

Slide 6 text

REST • Client-server • Stateless • Cacheable • Layered system • Code on demand (optional) • Uniform interface

Slide 7

Slide 7 text

Example: local HTTP/1.1 200 OK Date: Wed, 29 Oct 2014 15:04:20 GMT Server: Apache/2.2.15 (CentOS) Last-Modified: Wed, 29 Oct 2014 14:54:23 GMT Accept-Ranges: bytes Content-Length: 212 Cache-Control: max-age=31536000 Expires: Thu, 29 Oct 2015 15:04:20 GMT Vary: Accept-Encoding Connection: close Content-Type: text/html; charset=UTF-8

Slide 8

Slide 8 text

Example: hotel HTTP/1.0 200 OK Date: Wed, 29 Oct 2014 15:05:32 GMT Server: Apache/2.2.15 (CentOS) Last-Modified: Wed, 29 Oct 2014 14:54:23 GMT Accept-Ranges: bytes Content-Length: 212 Cache-Control: max-age=31536000 Expires: Thu, 29 Oct 2015 15:05:32 GMT Vary: Accept-Encoding Content-Type: text/html; charset=UTF-8 X-Cache: MISS from localhost X-Cache-Lookup: MISS from localhost:3128 Via: 1.1 localhost:3128 (squid/2.7.STABLE3) Connection: close

Slide 9

Slide 9 text

What changed? $ diff local hotel 1,2c1,2 < HTTP/1.1 200 OK < Date: Wed, 29 Oct 2014 15:04:20 GMT --- > HTTP/1.0 200 OK > Date: Wed, 29 Oct 2014 15:05:32 GMT 8c8 < Expires: Thu, 29 Oct 2015 15:04:20 GMT --- > Expires: Thu, 29 Oct 2015 15:05:32 GMT 10d9 < Connection: close 11a11,14 > X-Cache: MISS from localhost > X-Cache-Lookup: MISS from localhost:3128 > Via: 1.1 localhost:3128 (squid/2.7.STABLE3) > Connection: close

Slide 10

Slide 10 text

HTTP 1.2

Slide 11

Slide 11 text

Relevant HTTP 1.2 Headers • Age: • Authorization: • Cache-Control: • Connection: • ETag: • Expires: • If-Match: • If-None-Match: • If-Range: • Pragma: • Vary: • Warning:

Slide 12

Slide 12 text

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
 
 or
 
 http://bit.ly/1p0KHQr

Slide 13

Slide 13 text

Cache-Control: Requests • no-cache • no-store • max-age={seconds} • max-stale={seconds} • min-fresh={seconds} • no-transform • only-if-cached

Slide 14

Slide 14 text

Cache-Control: Responses • public • private • no-cache • no-store • no-transform • must-revalidate • proxy-revalidate • max-age={seconds} • s-maxage={seconds}

Slide 15

Slide 15 text

Adding headers in PHP void header ( string $string [, 
 bool $replace = true [, 
 int $http_response_code ]] )

Slide 16

Slide 16 text

Cache-Control: in PHP header(‘Cache-Control: no-cache’); header(‘Cache-Control: max-age=600’);

Slide 17

Slide 17 text

ETag: • Don’t use them • Generation is not specified by the HTTP standard, and is often not consistent across a cluster. • Error-prone and can be used to track users who refuse cookies. • Turn them off; don’t use them

Slide 18

Slide 18 text

Expires: • Indicates when the resource is stale. • Specifies a date/time rather than delta seconds (Cache-Control: max-age=S) • Mostly used for compatibility with HTTP 1.0; Cache-Control: is more semantically rich.

Slide 19

Slide 19 text

Extensions • Cache-Control: max-age={s}, stale-while- revalidate={s} • Cache-Control: max-age={s}, stale-while- error={s}

Slide 20

Slide 20 text

?

Slide 21

Slide 21 text

Is data cacheable? • Highly cacheable data: news stories, blog posts, aggregated data such as ratings or reviews (“likes”). • Uncacheable: secure, private, personal data such as user login information, credit card info, etc. Data that must change rapidly—stock quotes, for example, or health monitoring systems.

Slide 22

Slide 22 text

Cache Architectures

Slide 23

Slide 23 text

Web Server Service Example 1. No cache

Slide 24

Slide 24 text

Web Server Cache (Proxy) Service Web Server Example 2. Shared Cache

Slide 25

Slide 25 text

Web Server Cache (Proxy) Service Web Server Web Server Cache (Proxy) Web Server ICP Example 3. Distributed Cache

Slide 26

Slide 26 text

Web Server Cache (Proxy) Service Web Server Web Server Cache (Proxy) Web Server ICP (local cache) (local cache) (local cache) (local cache) Example 4. Local+Remote Cache

Slide 27

Slide 27 text

HTTP Proxies

Slide 28

Slide 28 text

Squid • Old, venerable; the reference implementation for the HTTP standard • Single-threaded • Can be tricky to configure (a multitude of options) but very high-performance • Implements ICP (Internet Cache Protocol) for distributed and hierarchical caches

Slide 29

Slide 29 text

Varnish • More modern implementation than Squid; relies on virtual memory and multi-threaded access • Easier to set up and configure than squid • Does not support ICP or cache hierarchies

Slide 30

Slide 30 text

nginx • reverse proxy and webserver - does not need a separate web server process • great for static content, according to users • uses asynchronous sockets; one process per core architecture

Slide 31

Slide 31 text

Manual Caching

Slide 32

Slide 32 text

DIY caching • Tools let you build your own cache system. • Not transparent, but can build transparency. • Most are simple key/value stores • Requires writing code

Slide 33

Slide 33 text

DIY cache example • Object retrieval interface fetches data from service. • Internal methods query the data store (memcached, Redis) first and use stored data if possible. • If data is not in the cache, fetch it from the backend service and store it in the cache.

Slide 34

Slide 34 text

Upsides for DIY caching • Provides a very clean programmatic interface (transparent at the application level) • Can be tailored to specific solutions where you understand the data. • Often very high performance

Slide 35

Slide 35 text

Downsides to DIY caching • Requires code to be written, tested, etc. • Requires code maintenance if the underlying data model is changed. • Not standardized like HTTP for specifying age, freshness of data (i.e., not a generic solution, but a custom one)

Slide 36

Slide 36 text

Edge Caching

Slide 37

Slide 37 text

What is an “edge cache?” • A content delivery network (CDN) that holds static content on the “edges” of the Internet • Akamai is the biggest, but there are others: LimeLight, Microsoft Azure, Amazon CloudFront • Stores static content in multiple data centers • Content like JavaScript, CSS, images, and other media

Slide 38

Slide 38 text

How does a CDN work? • Primary site (www.example.com) serves the HTML page. • <style> <img> etc. tags reference static content on the CDN • User’s browsers loads (and often stores) the static content locally, because it’s served with a Cache-Control: max-age=32767 header.

Slide 39

Slide 39 text

Q&A

Slide 40

Slide 40 text

[email protected]
 @glenc
 http://www.glencampbell.co http://developer.rackspace.com
 
 Free Cloud!