Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Caching on the Bleeding Edge

Caching on the Bleeding Edge

Samantha Quiñones

November 19, 2015
Tweet

More Decks by Samantha Quiñones

Other Decks in Technology

Transcript

  1. SAMANTHA QUIÑONES ABOUT ME ▸ Software Engineer & Data Nerd

    since 1997 ▸ Doing “media stuff” since 2012 ▸ Principal @ AOL since 2014 ▸ @ieatkillerbees ▸ http://samanthaquinones.com
  2. A BRIEF HISTORY OF CACHING PRIMARY STORAGE ▸ Storage that

    is addressable by the CPU ▸ Examples: ▸ Williams tube ▸ Magnetic Core memory ▸ Magnetic Drum memory ▸ Solid state SRAM & DRAM
  3. A BRIEF HISTORY OF CACHING SECONDARY STORAGE ▸ Storage addressable

    through a controller or channel. ▸ Examples: ▸ Magnetic Tape ▸ HDD ▸ SDD ▸ Flash memory
  4. A BRIEF HISTORY OF CACHING FRITZ-RUDOLPH GÜNTSCH ▸ Virtual memory

    pioneer ▸ Logical Design of a Digital Computer with Multiple Asynchronous Rotating Drums and Automatic High Speed Memory Operation (1956) ▸ Automated organization and swapping of pages
  5. A BRIEF HISTORY OF CACHING TRANSACTION LOOKASIDE BUFFERS ▸ Introduced

    in the IBM System 360 Model 67 (1967) and the GE 645 (1969) ▸ Fast caching of virtual memory address translations
  6. A BRIEF HISTORY OF CACHING CACHE MILESTONES ▸ 1969 -

    IBM System 360 Model 85 introduces CPU cache ▸ 1982 - Motorola 68010 features on-board instruction cache ▸ 1987 - Motorola 68030 features a 256-byte data cache ▸ 1989 - Intel 486 features split cache with on-die L1 and on mainboard L2 ▸ 1995 - Intel Pentium Pro has L1 and L2 on die ▸ 2013 - Intel Haswell MA has 3 caches with shared L4 for CPU & on-board GPU
  7. PROGRAMS TEND TO REUSE DATA AND INSTRUCTIONS THEY HAVE USED

    RECENTLY. Hennessy and Patterson Computer Architecture: A Quantitative Approach THEORY OF CACHING
  8. THEORY OF CACHING LOCALITY OF REFERENCE ▸ Spatial Locality: When

    data is referenced, it tends to be referenced again soon. ▸ Temporal Locality: When data is referenced, nearby data tends to be referenced soon.
  9. THEORY OF CACHING SOME TERMINOLOGY ▸ Cache Hit - Data

    was found in cache ▸ Cache Miss - Data was not found in cache
  10. void add1 (int A[N][N], int B[N][N], int C[N][N]) { int

    i, j, k, sum; for (j=0; j<N; j++) { for (i=0; i<N; i++) { C[i][j] = A[i][j] + B[i][j]; } } } void add2 (int A[N][N], int B[N][N], int C[N][N]) { int i, j, k, sum; for (i=0; i<N; i++) { for (j=0; j<N; j++) { C[i][j] = A[i][j] + B[i][j]; } } } > $ ./matrix add1 164.947510 nanoseconds per access add2 34.484863 nanoseconds per access 4x faster?!
  11. j for (j=0; j<N; j++) i for (i=0; i<N; i++)

    Current Position: A[0][0] A[0][1] A[0][2] A[0][3] A[0][4] A[0][5] A[0][6] A[1][0] A[1][1] A[1][2] A[1][3] A[1][4] A[1][5] A[1][6]
  12. i for (i=0; i<N; i++) for (j=0; j<N; j++) j

    Current Position: A[0][0] A[1][0] A[2][0] A[0][1] A[0][2]
  13. THEORY OF CACHING LEVERAGING SPATIAL LOCALITY ▸ Understanding the geography

    of data (arrays are allocated in row-major order) ▸ When A[i][j] is referenced, nearby memory addresses are brought in to cache ▸ Code optimized to use cache has the potential to be MUCH faster. ▸ This is often what internals devs are talking about when considering whether or not some object is “in cache”
  14. THEORY OF CACHING CACHES AT THE HEART OF COMPUTER ARCHITECTURE

    (CORE I7) Size Latency L1 32KB ~1ns L2 256KB ~3ns L3 2MB/Core ~10ns
  15. THEORY OF CACHING LEVERAGING TEMPORAL LOCALITY ▸ Amortizing the cost

    of data access over multiple temporally proximate operations ▸ Algorithmic prediction of most likely-needed data
  16. THEORY OF CACHING STRATEGIES & ALGORITHMS ▸ Clairvoyant Algorithm (Bélády’s)

    - Theoretical algorithm that always discards the object which will not be accessed for the longest time. ▸ LRU - Least Recently Used objects are discarded first. Typically uses 2 bits per object to record “age” ▸ PLRU - LRU algorithm that sacrifices miss ratio for lower latency and lower power requirements. Typical implementations use 1 bit per object. Common in CPU caches. ▸ MRU - Most Recently Used objects are discarded first. Useful for cyclical or random access patterns over large sets of data. ▸ Random Replacement - Objects are discarded at random. Used in RISC platforms such as ARM as no housekeeping bits are required.
  17. THERE ARE ONLY TWO HARD THINGS IN COMPUTER SCIENCE: CACHE

    INVALIDATION AND NAMING THINGS. Phil Karlton
  18. …CACHE…[HAS] THE POTENTIAL TO PARTIALLY… ELIMINATE… INTERACTIONS, IMPROVING EFFICIENCY, SCALABILITY,

    AND… PERFORMANCE BY REDUCING THE AVERAGE LATENCY OF A SERIES OF INTERACTIONS. Roy “I invented REST” Fielding
  19. IF 80% OF TIME IS SPENT ACCESSING 20% OF DATA,

    OPTIMIZE FOR ACCESSING THAT 20% CACHING FOR APPLICATION DEVELOPERS
  20. DATA TIER CACHES CACHE ACCESS PATTERN START RETRIEVE DATA DATA

    IN CACHE? DB FETCH WRITE TO CACHE RETURN DATA END <?php class DataRepository { private $cache; private $db; public function getDataByKey($key) { $data = $cache->get($key); if (null === $data) { $data = $db->fetchByKey($key); $cache->set($key, $data); } return $data; } }
  21. BACK-END CACHES CACHING LOW-LEVEL DOMAIN OBJECTS class FooRepository extends Repository

    { private $cache; private $db; public function queryFoos($parameters_for_a_complex_query)) { $ids = $this->db->queryIds($parameters_for_a_complex_query); $foos = $this->cache->getMulti($ids); $missing_foos = $this->getMissingIds($ids, array_keys($foos)); foreach ($missing_foos as $missing_foo_id) { $missing_foo = $this->db->getFooById($missing_foo_id); $this->cache->set($missing_foo_id, $missing_foo); $foos[$missing_foo_id] = $missing_foo; } return $foos; } }
  22. MEMCACHED MEMCACHED ▸ Mem Cache Dee ▸ Originally written in

    Perl by Brad Fitzpatrick to speed up LiveJournal ▸ Re-written in C by Anatoly Vorobey ▸ Volatile, sharded, shared-nothing architecture
  23. MEMCACHED CONNECTING TO MEMCACHED $m = new Memcached(); $m->addServers([ ['memcache1.private.net',

    11211, 67], ['memcache2.private.net', 11211, 33] ]); HOSTNAME/IP PORT NUMBER WEIGHT ALL CLIENTS MUST KNOW ABOUT ALL SERVERS
  24. MEMCACHED SETS $m->set($key, $value, $ttl); $m->setByKey($server_key, $key, $value, $ttl); $m->setMulti($items,

    $ttl); $m->setMultiByKey($server_key, $items, $ttl); MEMCACHED CLIENT COMPUTES HASH OF KEY HASH IS USED TO COMPUTE SERVER AFFINITY SERIALIZED OBJECT IS STORED ON APPROPRIATE SERVER
  25. MEMCACHED GETS $m->get($key); $m->getByKey($server_key, $key) $m->getMulti($keys); $m->getMultiByKey($server_key, $keys); MEMCACHED CLIENT

    COMPUTES HASH OF KEY HASH IS USED TO COMPUTE SERVER AFFINITY SERIALIZED OBJECT IS REQUESTED FROM APPROPRIATE SERVER
  26. MEMCACHED DELAYED GETS (PHP) <?php $m = new Memcached(); $m->addServer('localhost',

    11211); for ($i=0; $i<3; $i++) { $m->set($i, "foo_$i"); } $m->getDelayed([0,1,2]); var_dump($m->fetchAll()); > $ php memcached.php Doing other things now! array(1) { [0] => array(2) { 'key' => string(1) "1" 'value' => string(5) "foo_1" } }
  27. MEMCACHED READ-THROUGH CACHE CALLBACKS (PHP) $m = new Memcached(); $m->addServer('localhost',

    11211); for ($i=0; $i<3; $i++) { $m->set($i, "foo_$i"); } $m->get([4], function($m, $key, &$value) { // cache miss! $value = $db->getByKey($key); return true; });
  28. CONSISTENT HASHING CONSISTENT HASHING Image © Mathias Meyer, from here

    -> http://www.paperplanes.de/2011/12/9/the-magic-of-consistent-hashing.html ▸ Nodes claim multiple partitions of hash key space (e.g. 2^160 bit space of SHA-1) ▸ Keys are hashed and assigned to the appropriate node ▸ Adding/removing a node only requires remapping K/n keys on average
  29. REDIS REDIS ▸ REmote Dictionary Service ▸ Started by Salvatore

    Sanfilipo in 2009 ▸ Modeled as a Data Structure Server ▸ Hashes ▸ Sets (Sorted and Unsorted) ▸ Lists ▸ Hashes ▸ Strings ▸ High-performance in memory key-value store with optional persistence
  30. REDIS <?php $redis = new Redis(); $redis->pconnect('127.0.0.1', 6379); $redis->set('string_val', 'I

    love redis’); $redis->mset(['string_1' => 'Caching is fun', 'string_2' => ‘Pumpkins!']); $redis->get(‘string_val'); $redis->mget(['string_1', 'string_2']);
  31. REDIS REDIS 3.X ▸ Redis Cluster ▸ Cluster bus with

    multi-master architecture ▸ Combines HA with sharding, allowing clusters to continue to operate with dead nodes
  32. REDIS REDIS CLUSTER REDIS MASTER REDIS MASTER REDIS MASTER HS

    0 HS 4 HS 8 HS 1 HS 6 HS 5 HS 7 HS 3 HS 2 REDIS SLAVE REDIS SLAVE REDIS SLAVE HS 0 HS 4 HS 8 HS 1 HS 6 HS 5 HS 7 HS 3 HS 2 REDIS SLAVE REDIS SLAVE REDIS SLAVE HS 0 HS 4 HS 8 HS 1 HS 6 HS 5 HS 7 HS 3 HS 2
  33. REDIS TWEMPROXY ▸ https://github.com/twitter/twemproxy ▸ Proxy for Redis (and memcached)

    ▸ Handles connection pipelining to minimize load on cache nodes ▸ Handles hashing (excellent for normalizing access from differing clients)
  34. REDIS AOL CACHELINK ▸ https://github.com/aol/cachelink-service ▸ Set & clear pipelining

    proxy ▸ Supports arbitrary key->key associations on set & clear
  35. REDIS REDIS & MEMCACHED ▸ Memcached is a fast, volatile

    key-value store ▸ Redis, among other things, is a fast, volatile key- value store ▸ Redis supports persistence (and thus can be used for “less” volatile data such as sessions ▸ Redis allows storing native data structures without serialization ▸ Redis implementations generally use consistent hashing to distribute keys
  36. APPLICATION TIER CACHES PHP AND THE APP TIER ▸ Persistent

    in-memory caches are difficult in PHP because PHP does not persist ▸ APC (defunct) ▸ APCu (beta) ▸ Given the speed of modern distributed caches, use cases are limited
  37. APPLICATION TIER CACHES AMORTIZING EXPENSIVE OPERATIONS > $ php ./fib.php

    Computed FIB(32) as 2178309 in 4.203 seconds > $ php ./fib.php Computed FIB(32) as 2178309 in 0.001 seconds function fib($n) { if ($n < 0) { return NULL; } elseif ($n === 0) { return 0; } elseif ($n === 1 || $n === 2) { return 1; } else { return fib($n-1) + fib($n-2); } } $n = 32; $t1 = microtime(true); $fib = apc_fetch(“fib_$n"); if (false === $fib) { $fib = fib($n); apc_store(“fib_$n", $fib); } $t2 = microtime(true); echo "Computed FIB($n) as $fib in " . ($t2-$t1) . " seconds\n";
  38. RDBMS RDBMS RDBMS APP SERVER APP SERVER GATEWAY DATA TIER

    CACHES PRESENTATION TIER CACHES APPLICATION TIER CACHES
  39. PRESENTATION TIER CACHES STATIC FILE CACHES ▸ Useful for static

    content with slow changes (blogs) ▸ Static site generators are an extreme example
  40. PRESENTATION TIER CACHES STATIC FILE CACHES ▸ Application renders finalized

    output on request or based on a publish event ▸ Output is written to shared storage ▸ Web servers deliver content from shared storage. Application servers are isolated from traffic. ▸ Breaks down completely with large/complex applications ORIGIN SERVER ORIGIN SERVER ORIGIN SERVER SHARED HIGH-SPEED STORAGE APP SERVER APP SERVER
  41. PRESENTATION TIER CACHES - NGINX NGINX ▸ Created by Igor

    Sysoev in 2002 ▸ Streamlined web server optimized for highly concurrent, low-overhead, http content delivery. ▸ Particularly optimized for static file delivery ▸ Designed to proxy over HTTP, WSGI, FastCGI (can be used as a load balancer) ▸ Can be configured to generate and maintain a file-based cache of output from external origins (over network/gateway protocols)
  42. NGINX NGINX CACHING proxy_cache_path /var/nginx/cache levels=1:2 keys_zone=cache_zone:10m inactive=60m; proxy_cache_key "$scheme$request_method$host$request_uri";

    server { listen 80 default_server; root /var/www/; index index.html index.htm; server_name aol.com www.aol.com; charset utf-8; location / { proxy_cache cache_zone; add_header X-Cache-State $upstream_cache_status; include proxy_params; proxy_pass http://10.0.0.2:9000; } }
  43. NGINX NGINX HEAD NGINX HEAD STATIC FILES LOCAL CACHE STATIC

    FILES LOCAL CACHE NGINX ORIGIN FCGI APP
  44. VARNISH VARNISH ▸ Open source caching reverse proxy ▸ Developed

    by Poul-Henning Kamp for Verdens Gang ▸ Uses memory heap allocation to minimize IO ▸ Optimizations are focused on eliminating system calls ▸ Algorithms to deliver requests to threads most likely to have objects cached in L1/L2
  45. VARNISH VARNISH CONFIG LANGUAGE sub vcl_recv { # Happens before

    we check if we have this in cache already. # # Typically you clean up the request here, removing cookies you don't need, # rewriting the request, etc. if (req.method == "PURGE") { if (!client.ip ~ purge) { return(synth(403,"Forbidden")); } return(purge); } set req.backend_hint = vdir.backend(); if (req.url ~ "wp-admin|wp-login") { return (pass); } unset req.http.cookie; }
  46. CDNS CONTENT DELIVERY NETWORKS ▸ Distributed cache services ▸ Designed

    to minimize the distance data needs to travel to get to a user ▸ Spatial locality on a global scale
  47. PRESENTATION TIER CACHING CACHE CONTROL & HTTP ▸ Gartner estimates

    4.9 Billion devices currently connected to the Internet ▸ The network of CDNs, proxies, gateways, and browsers constitute the single largest distributed cache ever created ▸ They (mostly) speak a common language!
  48. CACHE CONTROL CACHE CONTROL TO MAJOR TOM ▸ Cache-Control directives

    are designed to allow origins to communicate cache parameters to clients and proxies ▸ Directives dictate who should or shouldn’t cache, for how long objects should be considered fresh, and sets revalidation policies
  49. CACHE CONTROL PRIVACY DIRECTIVES ▸ private / public - Informs

    intermediary caches if a response is specific to the end user or not (THIS IS NOT A SECURITY FEATURE) ▸ no-cache [=<header>] - Without a header, tells caches that they must revalidate each request (by comparing hashes). With a header provided, this tells caches that they may store the object as long as they strip out the specified header. ▸ no-store - Directs caches to never store this object under any circumstances.
  50. CACHE CONTROL EXPIRATION DIRECTIVES ▸ max-age <seconds> - Tells caches

    for how long an object can be considered fresh ▸ s-maxage <seconds> - Max-age for shared caches (CDNs/CRPs). These caches will generally respect s-maxage over manage ▸ must-revalidate - Tells caches that they must revalidate (compare hashes) on any request and never serve stale data, even if otherwise configured to serve stale content. ▸ proxy-revalidate - must-revalidate for shared caches
  51. CACHE CONTROL OTHER DIRECTIVES ▸ no-transform - Instructs caches not

    to perform any data transformations (i.e. compressing or transcoding images)
  52. CACHE CONTROL NON-CACHE CONTROL HEADERS ▸ expires <datetime> - Expiry

    date/time for an object. Largely superseded by maxage ▸ etag - “Entity tag,” usually a hash of the object or hash of the object’s last modified time, used check freshness ▸ vary <header> - Informs caches that they can store one version of content per distinct version of <header>. For example, cache one version per User-Agent ▸ pragma - Deprecated
  53. CACHE CONTROL EXAMPLES HTTP/1.1 200 OK Cache-Control: public, max-age=3600 Content-Length:

    219 Content-Type: text/html; charset=UTF-8 Date: Tue, 17 Nov 2015 16:16:46 GMT HTTP/1.1 200 OK Cache-Control: public; max-age=86400; must-revalidate Etag: "6d82cbb050ddc7fa9cbb659014546e59" Content-Length: 552 Content-Type: text/html; charset=UTF-8 Date: Tue, 17 Nov 2015 16:42:21 GMT HTTP/1.1 200 OK Cache-Control: private, no-cache; Content-Length: 772 Content-Type: text/html; charset=UTF-8 Date: Tue, 17 Nov 2015 16:44:21 GMT
  54. CURRENT CACHING EXPERIMENTATION PROBLEM STATEMENT ▸ Our edge is getting

    edgier - much of our growth is happening in developing markets ▸ User agent diversity is increasing dramatically (mobile dominance) ▸ Content collection definitions are less and less deterministic, requiring more flexibility in search and query ops
  55. NEXT STEPS WHAT IF ▸ We can enable a search-oriented

    interface which allows complex queries while… ▸ Eliminating external (user load) on our RDMBS infrastructure and… ▸ Provide content managers with localized (near-edge), on-demand cache invalidation
  56. RDBMS RDBMS RDBMS APP SERVER APP SERVER GATEWAY DATA TIER

    CACHES PRESENTATION TIER CACHES DOCUMENT CACHES APPLICATION TIER CACHES
  57. DOCUMENT CACHES ELASTICSEARCH ▸ A full-text search database ▸ A

    high performance NOSQL document store that features ▸ High-availability via clustering ▸ Rack/Datacentre-aware sharding ▸ Expressive & dynamic query DSL
  58. ELASTICSEARCH CLUSTER (US-EAST) ELASTICSEARCH CLUSTER (US-WEST) ELASTICSEARCH CLUSTER (EU-FRA) ELASTICSEARCH

    CLUSTER (AP-NRT) GLOBAL MESSAGING BUS CONTENT MANAGEMENT SERVICE function (event, callback) { var index = 'posts'; var type = 'post'; var id = event['id']; if (!id) { return callback('Invalid post object received'); } indexRecord(index, type, id, event, callback); },
  59. CENTRALIZED RDBMS CONTENT MANAGEMENT API CMS CONTENT RENDERING API APPLICATION

    SEARCHABLE DOCUMENT CACHE CONTENT CACHE DATA CACHE GLOBAL MESSAGING BUS ▸ EXPOSED TO END-USER LOAD ▸ CAN BE LOCATED NEAR THE EDGE ▸ SELF-CONTAINED ORIGIN ▸ SLOWEST COMPONENTS ▸ MINIMAL LOAD ▸ CAN BE CENTRALIZED
  60. CACHING SUMMARY WHAT IS CACHING ▸ Amortizing the most expensive

    operations in your application ▸ Optimizing the most common operations in your application ▸ Minimizing the distance between where data lives and where data is used
  61. CACHING SUMMARY WHEN NOT TO USE CACHE ▸ When you

    don’t care about your users’ experience ▸ When you have infinite money to waste on compute time ▸ When you don’t care how much carbon you pump in to the atmosphere
  62. CACHING RESOURCE LINKS TO THINGS ▸ http://memcached.org/ ▸ http://redis.io ▸

    http://varnish-cache.org ▸ https://github.com/twitter/twemproxy ▸ https://github.com/aol/cachelink-service ▸ http://elastic.co