Caching on the Bleeding Edge

CACHING ON THE BLEEDING EDGE SAMANTHA QUIÑONES - PHP[WORLD] 2015

SAMANTHA QUIÑONES ABOUT ME ▸ Software Engineer & Data Nerd
since 1997 ▸ Doing “media stuﬀ” since 2012 ▸ Principal @ AOL since 2014 ▸ @ieatkillerbees ▸ http://samanthaquinones.com

THAT AOL?

CACHING A BRIEF HISTORY OF

A BRIEF HISTORY OF CACHING PRIMARY STORAGE ▸ Storage that
is addressable by the CPU ▸ Examples: ▸ Williams tube ▸ Magnetic Core memory ▸ Magnetic Drum memory ▸ Solid state SRAM & DRAM

A BRIEF HISTORY OF CACHING SECONDARY STORAGE ▸ Storage addressable
through a controller or channel. ▸ Examples: ▸ Magnetic Tape ▸ HDD ▸ SDD ▸ Flash memory

A BRIEF HISTORY OF CACHING FRITZ-RUDOLPH GÜNTSCH ▸ Virtual memory
pioneer ▸ Logical Design of a Digital Computer with Multiple Asynchronous Rotating Drums and Automatic High Speed Memory Operation (1956) ▸ Automated organization and swapping of pages

A BRIEF HISTORY OF CACHING TRANSACTION LOOKASIDE BUFFERS ▸ Introduced
in the IBM System 360 Model 67 (1967) and the GE 645 (1969) ▸ Fast caching of virtual memory address translations

A BRIEF HISTORY OF CACHING CACHE MILESTONES ▸ 1969 -
IBM System 360 Model 85 introduces CPU cache ▸ 1982 - Motorola 68010 features on-board instruction cache ▸ 1987 - Motorola 68030 features a 256-byte data cache ▸ 1989 - Intel 486 features split cache with on-die L1 and on mainboard L2 ▸ 1995 - Intel Pentium Pro has L1 and L2 on die ▸ 2013 - Intel Haswell MA has 3 caches with shared L4 for CPU & on-board GPU

CACHING THE THEORY OF

PROGRAMS TEND TO REUSE DATA AND INSTRUCTIONS THEY HAVE USED
RECENTLY. Hennessy and Patterson Computer Architecture: A Quantitative Approach THEORY OF CACHING

THEORY OF CACHING LOCALITY OF REFERENCE ▸ Spatial Locality: When
data is referenced, it tends to be referenced again soon. ▸ Temporal Locality: When data is referenced, nearby data tends to be referenced soon.

THEORY OF CACHING SOME TERMINOLOGY ▸ Cache Hit - Data
was found in cache ▸ Cache Miss - Data was not found in cache

void add1 (int A[N][N], int B[N][N], int C[N][N]) { int
i, j, k, sum; for (j=0; j<N; j++) { for (i=0; i<N; i++) { C[i][j] = A[i][j] + B[i][j]; } } } void add2 (int A[N][N], int B[N][N], int C[N][N]) { int i, j, k, sum; for (i=0; i<N; i++) { for (j=0; j<N; j++) { C[i][j] = A[i][j] + B[i][j]; } } } > $ ./matrix add1 164.947510 nanoseconds per access add2 34.484863 nanoseconds per access 4x faster?!

j for (j=0; j<N; j++) i for (i=0; i<N; i++)
Current Position: A[0][0] A[0][1] A[0][2] A[0][3] A[0][4] A[0][5] A[0][6] A[1][0] A[1][1] A[1][2] A[1][3] A[1][4] A[1][5] A[1][6]

i for (i=0; i<N; i++) for (j=0; j<N; j++) j
Current Position: A[0][0] A[1][0] A[2][0] A[0][1] A[0][2]

THEORY OF CACHING LEVERAGING SPATIAL LOCALITY ▸ Understanding the geography
of data (arrays are allocated in row-major order) ▸ When A[i][j] is referenced, nearby memory addresses are brought in to cache ▸ Code optimized to use cache has the potential to be MUCH faster. ▸ This is often what internals devs are talking about when considering whether or not some object is “in cache”

THEORY OF CACHING CACHES AT THE HEART OF COMPUTER ARCHITECTURE
(CORE I7) Size Latency L1 32KB ~1ns L2 256KB ~3ns L3 2MB/Core ~10ns

THEORY OF CACHING LEVERAGING TEMPORAL LOCALITY ▸ Amortizing the cost
of data access over multiple temporally proximate operations ▸ Algorithmic prediction of most likely-needed data

THEORY OF CACHING STRATEGIES & ALGORITHMS ▸ Clairvoyant Algorithm (Bélády’s)
- Theoretical algorithm that always discards the object which will not be accessed for the longest time. ▸ LRU - Least Recently Used objects are discarded first. Typically uses 2 bits per object to record “age” ▸ PLRU - LRU algorithm that sacrifices miss ratio for lower latency and lower power requirements. Typical implementations use 1 bit per object. Common in CPU caches. ▸ MRU - Most Recently Used objects are discarded first. Useful for cyclical or random access patterns over large sets of data. ▸ Random Replacement - Objects are discarded at random. Used in RISC platforms such as ARM as no housekeeping bits are required.

THERE ARE ONLY TWO HARD THINGS IN COMPUTER SCIENCE: CACHE
INVALIDATION AND NAMING THINGS. Phil Karlton

CACHING FOR APPLICATION DEVELOPERS

…CACHE…[HAS] THE POTENTIAL TO PARTIALLY… ELIMINATE… INTERACTIONS, IMPROVING EFFICIENCY, SCALABILITY,
AND… PERFORMANCE BY REDUCING THE AVERAGE LATENCY OF A SERIES OF INTERACTIONS. Roy “I invented REST” Fielding

VILFREDO PARETO 80% OF EVENTS STEM FROM 20% OF CAUSES

IF 80% OF TIME IS SPENT ACCESSING 20% OF DATA,
OPTIMIZE FOR ACCESSING THAT 20% CACHING FOR APPLICATION DEVELOPERS

RDBMS RDBMS RDBMS APP SERVER APP SERVER GATEWAY DATA TIER
CACHES

DATA TIER CACHES CACHE ACCESS PATTERN START RETRIEVE DATA DATA
IN CACHE? DB FETCH WRITE TO CACHE RETURN DATA END <?php class DataRepository { private $cache; private $db; public function getDataByKey($key) { $data = $cache->get($key); if (null === $data) { $data = $db->fetchByKey($key); $cache->set($key, $data); } return $data; } }

BACK-END CACHES CACHING LOW-LEVEL DOMAIN OBJECTS class FooRepository extends Repository
{ private $cache; private $db; public function queryFoos($parameters_for_a_complex_query)) { $ids = $this->db->queryIds($parameters_for_a_complex_query); $foos = $this->cache->getMulti($ids); $missing_foos = $this->getMissingIds($ids, array_keys($foos)); foreach ($missing_foos as $missing_foo_id) { $missing_foo = $this->db->getFooById($missing_foo_id); $this->cache->set($missing_foo_id, $missing_foo); $foos[$missing_foo_id] = $missing_foo; } return $foos; } }

MEMCACHED MEMCACHED ▸ Mem Cache Dee ▸ Originally written in
Perl by Brad Fitzpatrick to speed up LiveJournal ▸ Re-written in C by Anatoly Vorobey ▸ Volatile, sharded, shared-nothing architecture

MEMCACHED CONNECTING TO MEMCACHED $m = new Memcached(); $m->addServers([ ['memcache1.private.net',
11211, 67], ['memcache2.private.net', 11211, 33] ]); HOSTNAME/IP PORT NUMBER WEIGHT ALL CLIENTS MUST KNOW ABOUT ALL SERVERS

MEMCACHED SETS $m->set($key, $value, $ttl); $m->setByKey($server_key, $key, $value, $ttl); $m->setMulti($items,
$ttl); $m->setMultiByKey($server_key, $items, $ttl); MEMCACHED CLIENT COMPUTES HASH OF KEY HASH IS USED TO COMPUTE SERVER AFFINITY SERIALIZED OBJECT IS STORED ON APPROPRIATE SERVER

MEMCACHED GETS $m->get($key); $m->getByKey($server_key, $key) $m->getMulti($keys); $m->getMultiByKey($server_key, $keys); MEMCACHED CLIENT
COMPUTES HASH OF KEY HASH IS USED TO COMPUTE SERVER AFFINITY SERIALIZED OBJECT IS REQUESTED FROM APPROPRIATE SERVER

MEMCACHED DELAYED GETS (PHP) <?php $m = new Memcached(); $m->addServer('localhost',
11211); for ($i=0; $i<3; $i++) { $m->set($i, "foo_$i"); } $m->getDelayed([0,1,2]); var_dump($m->fetchAll()); > $ php memcached.php Doing other things now! array(1) { [0] => array(2) { 'key' => string(1) "1" 'value' => string(5) "foo_1" } }

MEMCACHED READ-THROUGH CACHE CALLBACKS (PHP) $m = new Memcached(); $m->addServer('localhost',
11211); for ($i=0; $i<3; $i++) { $m->set($i, "foo_$i"); } $m->get([4], function($m, $key, &$value) { // cache miss! $value = $db->getByKey($key); return true; });

CONSISTENT HASHING CONSISTENT HASHING Image © Mathias Meyer, from here
-> http://www.paperplanes.de/2011/12/9/the-magic-of-consistent-hashing.html ▸ Nodes claim multiple partitions of hash key space (e.g. 2^160 bit space of SHA-1) ▸ Keys are hashed and assigned to the appropriate node ▸ Adding/removing a node only requires remapping K/n keys on average

REDIS REDIS ▸ REmote Dictionary Service ▸ Started by Salvatore
Sanﬁlipo in 2009 ▸ Modeled as a Data Structure Server ▸ Hashes ▸ Sets (Sorted and Unsorted) ▸ Lists ▸ Hashes ▸ Strings ▸ High-performance in memory key-value store with optional persistence

REDIS <?php $redis = new Redis(); $redis->pconnect('127.0.0.1', 6379); $redis->set('string_val', 'I
love redis’); $redis->mset(['string_1' => 'Caching is fun', 'string_2' => ‘Pumpkins!']); $redis->get(‘string_val'); $redis->mget(['string_1', 'string_2']);

REDIS REDIS 3.X ▸ Redis Cluster ▸ Cluster bus with
multi-master architecture ▸ Combines HA with sharding, allowing clusters to continue to operate with dead nodes

REDIS REDIS CLUSTER REDIS MASTER REDIS MASTER REDIS MASTER HS
0 HS 4 HS 8 HS 1 HS 6 HS 5 HS 7 HS 3 HS 2 REDIS SLAVE REDIS SLAVE REDIS SLAVE HS 0 HS 4 HS 8 HS 1 HS 6 HS 5 HS 7 HS 3 HS 2 REDIS SLAVE REDIS SLAVE REDIS SLAVE HS 0 HS 4 HS 8 HS 1 HS 6 HS 5 HS 7 HS 3 HS 2

REDIS TWEMPROXY ▸ https://github.com/twitter/twemproxy ▸ Proxy for Redis (and memcached)
▸ Handles connection pipelining to minimize load on cache nodes ▸ Handles hashing (excellent for normalizing access from diﬀering clients)

REDIS TWEMPROXY APP APP APP TWEMPROXY REDIS REDIS

REDIS AOL CACHELINK ▸ https://github.com/aol/cachelink-service ▸ Set & clear pipelining
proxy ▸ Supports arbitrary key->key associations on set & clear

REDIS REDIS & MEMCACHED ▸ Memcached is a fast, volatile
key-value store ▸ Redis, among other things, is a fast, volatile key- value store ▸ Redis supports persistence (and thus can be used for “less” volatile data such as sessions ▸ Redis allows storing native data structures without serialization ▸ Redis implementations generally use consistent hashing to distribute keys

CACHES APPLICATION TIER CACHES

APPLICATION TIER CACHES PHP AND THE APP TIER ▸ Persistent
in-memory caches are diﬃcult in PHP because PHP does not persist ▸ APC (defunct) ▸ APCu (beta) ▸ Given the speed of modern distributed caches, use cases are limited

APPLICATION TIER CACHES AMORTIZING EXPENSIVE OPERATIONS > $ php ./fib.php
Computed FIB(32) as 2178309 in 4.203 seconds > $ php ./fib.php Computed FIB(32) as 2178309 in 0.001 seconds function fib($n) { if ($n < 0) { return NULL; } elseif ($n === 0) { return 0; } elseif ($n === 1 || $n === 2) { return 1; } else { return fib($n-1) + fib($n-2); } } $n = 32; $t1 = microtime(true); $fib = apc_fetch(“fib_$n"); if (false === $fib) { $fib = fib($n); apc_store(“fib_$n", $fib); } $t2 = microtime(true); echo "Computed FIB($n) as $fib in " . ($t2-$t1) . " seconds\n";

CACHES PRESENTATION TIER CACHES APPLICATION TIER CACHES

PRESENTATION TIER CACHES STATIC FILE CACHES ▸ Useful for static
content with slow changes (blogs) ▸ Static site generators are an extreme example

PRESENTATION TIER CACHES STATIC FILE CACHES ▸ Application renders ﬁnalized
output on request or based on a publish event ▸ Output is written to shared storage ▸ Web servers deliver content from shared storage. Application servers are isolated from traﬃc. ▸ Breaks down completely with large/complex applications ORIGIN SERVER ORIGIN SERVER ORIGIN SERVER SHARED HIGH-SPEED STORAGE APP SERVER APP SERVER

PRESENTATION TIER CACHES - NGINX NGINX ▸ Created by Igor
Sysoev in 2002 ▸ Streamlined web server optimized for highly concurrent, low-overhead, http content delivery. ▸ Particularly optimized for static file delivery ▸ Designed to proxy over HTTP, WSGI, FastCGI (can be used as a load balancer) ▸ Can be configured to generate and maintain a file-based cache of output from external origins (over network/gateway protocols)

NGINX NGINX CACHING proxy_cache_path /var/nginx/cache levels=1:2 keys_zone=cache_zone:10m inactive=60m; proxy_cache_key "$scheme$request_method$host$request_uri";
server { listen 80 default_server; root /var/www/; index index.html index.htm; server_name aol.com www.aol.com; charset utf-8; location / { proxy_cache cache_zone; add_header X-Cache-State $upstream_cache_status; include proxy_params; proxy_pass http://10.0.0.2:9000; } }

NGINX NGINX HEAD NGINX HEAD STATIC FILES LOCAL CACHE STATIC
FILES LOCAL CACHE NGINX ORIGIN FCGI APP

VARNISH VARNISH ▸ Open source caching reverse proxy ▸ Developed
by Poul-Henning Kamp for Verdens Gang ▸ Uses memory heap allocation to minimize IO ▸ Optimizations are focused on eliminating system calls ▸ Algorithms to deliver requests to threads most likely to have objects cached in L1/L2

VARNISH HTTP SOCKET WORKSPACE THREAD LOCAL CACHE THREAD LOCAL CACHE
THREAD LOCAL CACHE THREAD LOCAL CACHE

VARNISH VARNISH CONFIG LANGUAGE sub vcl_recv { # Happens before
we check if we have this in cache already. # # Typically you clean up the request here, removing cookies you don't need, # rewriting the request, etc. if (req.method == "PURGE") { if (!client.ip ~ purge) { return(synth(403,"Forbidden")); } return(purge); } set req.backend_hint = vdir.backend(); if (req.url ~ "wp-admin|wp-login") { return (pass); } unset req.http.cookie; }

CDNS CONTENT DELIVERY NETWORKS ▸ Distributed cache services ▸ Designed
to minimize the distance data needs to travel to get to a user ▸ Spatial locality on a global scale

PRESENTATION TIER CACHING CACHE CONTROL & HTTP ▸ Gartner estimates
4.9 Billion devices currently connected to the Internet ▸ The network of CDNs, proxies, gateways, and browsers constitute the single largest distributed cache ever created ▸ They (mostly) speak a common language!

CACHE CONTROL CACHE CONTROL TO MAJOR TOM ▸ Cache-Control directives
are designed to allow origins to communicate cache parameters to clients and proxies ▸ Directives dictate who should or shouldn’t cache, for how long objects should be considered fresh, and sets revalidation policies

CACHE CONTROL PRIVACY DIRECTIVES ▸ private / public - Informs
intermediary caches if a response is speciﬁc to the end user or not (THIS IS NOT A SECURITY FEATURE) ▸ no-cache [=<header>] - Without a header, tells caches that they must revalidate each request (by comparing hashes). With a header provided, this tells caches that they may store the object as long as they strip out the speciﬁed header. ▸ no-store - Directs caches to never store this object under any circumstances.

CACHE CONTROL EXPIRATION DIRECTIVES ▸ max-age <seconds> - Tells caches
for how long an object can be considered fresh ▸ s-maxage <seconds> - Max-age for shared caches (CDNs/CRPs). These caches will generally respect s-maxage over manage ▸ must-revalidate - Tells caches that they must revalidate (compare hashes) on any request and never serve stale data, even if otherwise conﬁgured to serve stale content. ▸ proxy-revalidate - must-revalidate for shared caches

CACHE CONTROL OTHER DIRECTIVES ▸ no-transform - Instructs caches not
to perform any data transformations (i.e. compressing or transcoding images)

CACHE CONTROL NON-CACHE CONTROL HEADERS ▸ expires <datetime> - Expiry
date/time for an object. Largely superseded by maxage ▸ etag - “Entity tag,” usually a hash of the object or hash of the object’s last modiﬁed time, used check freshness ▸ vary <header> - Informs caches that they can store one version of content per distinct version of <header>. For example, cache one version per User-Agent ▸ pragma - Deprecated

CACHE CONTROL EXAMPLES HTTP/1.1 200 OK Cache-Control: public, max-age=3600 Content-Length:
219 Content-Type: text/html; charset=UTF-8 Date: Tue, 17 Nov 2015 16:16:46 GMT HTTP/1.1 200 OK Cache-Control: public; max-age=86400; must-revalidate Etag: "6d82cbb050ddc7fa9cbb659014546e59" Content-Length: 552 Content-Type: text/html; charset=UTF-8 Date: Tue, 17 Nov 2015 16:42:21 GMT HTTP/1.1 200 OK Cache-Control: private, no-cache; Content-Length: 772 Content-Type: text/html; charset=UTF-8 Date: Tue, 17 Nov 2015 16:44:21 GMT

NEXT STEPS

CURRENT CACHING EXPERIMENTATION PROBLEM STATEMENT ▸ Our edge is getting
edgier - much of our growth is happening in developing markets ▸ User agent diversity is increasing dramatically (mobile dominance) ▸ Content collection deﬁnitions are less and less deterministic, requiring more ﬂexibility in search and query ops

CENTRALIZED RDBMS CONTENT MANAGEMENT API CMS CONTENT RENDERING API APPLICATION
SEARCH ENGINE CONTENT CACHE DATA CACHE

NEXT STEPS WHAT IF ▸ We can enable a search-oriented
interface which allows complex queries while… ▸ Eliminating external (user load) on our RDMBS infrastructure and… ▸ Provide content managers with localized (near-edge), on-demand cache invalidation

CACHES PRESENTATION TIER CACHES DOCUMENT CACHES APPLICATION TIER CACHES

DOCUMENT CACHES ELASTICSEARCH ▸ A full-text search database ▸ A
high performance NOSQL document store that features ▸ High-availability via clustering ▸ Rack/Datacentre-aware sharding ▸ Expressive & dynamic query DSL

ELASTICSEARCH CLUSTER (US-EAST) ELASTICSEARCH CLUSTER (US-WEST) ELASTICSEARCH CLUSTER (EU-FRA) ELASTICSEARCH
CLUSTER (AP-NRT) GLOBAL MESSAGING BUS CONTENT MANAGEMENT SERVICE function (event, callback) { var index = 'posts'; var type = 'post'; var id = event['id']; if (!id) { return callback('Invalid post object received'); } indexRecord(index, type, id, event, callback); },

CENTRALIZED RDBMS CONTENT MANAGEMENT API CMS CONTENT RENDERING API APPLICATION
SEARCHABLE DOCUMENT CACHE CONTENT CACHE DATA CACHE GLOBAL MESSAGING BUS ▸ EXPOSED TO END-USER LOAD ▸ CAN BE LOCATED NEAR THE EDGE ▸ SELF-CONTAINED ORIGIN ▸ SLOWEST COMPONENTS ▸ MINIMAL LOAD ▸ CAN BE CENTRALIZED

CACHING IS IMPORTANT GETTING TO THE POINT

CACHING SUMMARY WHAT IS CACHING ▸ Amortizing the most expensive
operations in your application ▸ Optimizing the most common operations in your application ▸ Minimizing the distance between where data lives and where data is used

CACHING SUMMARY WHEN NOT TO USE CACHE ▸ When you
don’t care about your users’ experience ▸ When you have inﬁnite money to waste on compute time ▸ When you don’t care how much carbon you pump in to the atmosphere

CACHING RESOURCE LINKS TO THINGS ▸ http://memcached.org/ ▸ http://redis.io ▸
http://varnish-cache.org ▸ https://github.com/twitter/twemproxy ▸ https://github.com/aol/cachelink-service ▸ http://elastic.co

FEEDBACK? HTTPS://JOIND.IN/14763

Caching on the Bleeding Edge

Caching on the Bleeding Edge

More Decks by Samantha Quiñones

Other Decks in Technology

Featured

Transcript