Upgrade to Pro — share decks privately, control downloads, hide ads and more …

C.R.E.A.M. - Cache Rules Everything Around Me - infoShare 2016

C.R.E.A.M. - Cache Rules Everything Around Me - infoShare 2016

C.R.E.A.M. - Cache Rules Everything Around Me - infoShare 2016 https://infoshare.pl

Ca901ddcea38854b9783781c91fc87c9?s=128

Thijs Feryn

May 19, 2016
Tweet

Transcript

  1. C.R.E.A.M CASH RULES EVERYTHING AROUND ME CACHE Thijs Feryn

  2. Hi, I’m Thijs

  3. I’m @ThijsFeryn on Twitter

  4. I’m an Evangelist At

  5. I’m a at board member

  6. Slow websites suck

  7. Web performance is an essential part of the user experience

  8. Infrastructure

  9. Code

  10. Slow database

  11. Browser rendering

  12. User location

  13. Down Slowdown ~ downtime

  14. Code efficiently

  15. Identify slowest parts

  16. Optimize database

  17. Optimize runtime

  18. After a while you hit the limits

  19. Optimize database Optimize runtime A void A void

  20. Don’t recompute if the data hasn’t changed

  21. Cache

  22. 3 x 2 = ?

  23. What can you cache?

  24. What can you cache? Byte code Database output External services

    Files from disk Pages
  25. Caching is not a compensation for poor code

  26. Caching is an essential architectural strategy

  27. The goal

  28. Performance != Scalability

  29. Performance: speed

  30. Scalability: constant speed with increasing load

  31. Caching toolkit

  32. ✓Varnish ✓Redis ✓Shared memory ✓ElasticSearch * Caching toolkit

  33. Quick overview

  34. Varnish

  35. Normally User Server

  36. With Varnish User Varnish Server

  37. None
  38. Stores HTTP output in memory

  39. Respects cache-control headers

  40. Varnish Configuration Language

  41. sub vcl_recv { if (req.method == "PRI") { /* We

    do not support SPDY or HTTP/2.0 */ return (synth(405)); } if (req.method != "GET" && req.method != "HEAD" && req.method != "PUT" && req.method != "POST" && req.method != "TRACE" && req.method != "OPTIONS" && req.method != "DELETE") { /* Non-RFC2616 or CONNECT which is weird. */ return (pipe); } if (req.method != "GET" && req.method != "HEAD") { /* We only deal with GET and HEAD by default */ return (pass); } if (req.http.Authorization || req.http.Cookie) { /* Not cacheable by default */ return (pass); } return (hash); }
  42. sub vcl_pipe { # By default Connection: close is set

    on all piped requests, to stop # connection reuse from sending future requests directly to the # (potentially) wrong backend. If you do want this to happen, you can undo # it here. # unset bereq.http.connection; return (pipe); } sub vcl_pass { return (fetch); } sub vcl_hash { hash_data(req.url); if (req.http.host) { hash_data(req.http.host); } else { hash_data(server.ip); } return (lookup); }
  43. sub vcl_purge { return (synth(200, "Purged")); } sub vcl_hit {

    if (obj.ttl >= 0s) { // A pure unadultered hit, deliver it return (deliver); } if (obj.ttl + obj.grace > 0s) { // Object is in grace, deliver it // Automatically triggers a background fetch return (deliver); } // fetch & deliver once we get the result return (miss); } sub vcl_miss { return (fetch); } sub vcl_deliver { return (deliver); }
  44. sub vcl_backend_fetch { return (fetch); } sub vcl_backend_response { if

    (beresp.ttl <= 0s || beresp.http.Set-Cookie || beresp.http.Surrogate-control ~ "no-store" || (!beresp.http.Surrogate-Control && beresp.http.Cache-Control ~ "no-cache|no-store|private") || beresp.http.Vary == "*") { /* * Mark as "Hit-For-Pass" for the next 2 minutes */ set beresp.ttl = 120s; set beresp.uncacheable = true; } return (deliver); }
  45. None
  46. ✓Caching ✓Proxying ✓Loadbalancing ✓Edge Side Includes ✓Streaming ✓Compression ✓Invalidation ✓VMODS

    ✓Logging tools Varnish features
  47. HTTP accelerator

  48. None
  49. ✓ Key-value store ✓ Fast ✓ Lightweight ✓ Data stored

    in RAM ✓ ~Memcached ✓ Data types ✓ Data persistance ✓ Replication ✓ Clustering Redis
  50. Redis $ redis-cli 127.0.0.1:6379> ping PONG 127.0.0.1:6379> set mykey somevalue

    OK 127.0.0.1:6379> get mykey "somevalue
  51. ✓ Strings ✓ Hashes ✓ Lists ✓ Sets ✓ Sorted

    sets ✓ Geo ✓ … Redis data types
  52. $ redis-cli 127.0.0.1:6379> hset customer_1234 id 1234 (integer) 1 127.0.0.1:6379>

    hset customer_1234 items_in_cart 2 (integer) 1 127.0.0.1:6379> hmset customer_1234 firstname Thijs lastname Feryn OK 127.0.0.1:6379> hgetall customer_1234 1) "id" 2) "1234" 3) "items_in_cart" 4) "2" 5) "firstname" 6) "Thijs" 7) "lastname" 8) "Feryn" 127.0.0.1:6379> Redis
  53. $ redis-cli 127.0.0.1:6379> lpush products_for_customer_1234 5 (integer) 1 127.0.0.1:6379> lpush

    products_for_customer_1234 345 (integer) 2 127.0.0.1:6379> lpush products_for_customer_1234 78 12 345 (integer) 5 127.0.0.1:6379> llen products_for_customer_1234 (integer) 5 127.0.0.1:6379> lindex products_for_customer_1234 1 "12" 127.0.0.1:6379> lindex products_for_customer_1234 2 "78" 127.0.0.1:6379> rpop products_for_customer_1234 "5" 127.0.0.1:6379> rpop products_for_customer_1234 "345" 127.0.0.1:6379> rpop products_for_customer_1234 "78" 127.0.0.1:6379> rpop products_for_customer_1234 "12" 127.0.0.1:6379> rpop products_for_customer_1234 "345" 127.0.0.1:6379> rpop products_for_customer_1234 (nil) 127.0.0.1:6379> Redis
  54. Shared memory

  55. None
  56. I’m a PHP guy

  57. APCu

  58. <?php $start = microtime(); $hash = apc_fetch('password',$success); $hit = 'hit';

    if(!$success){ $hash = password_hash('azerty1!',PASSWORD_BCRYPT,['cost' => 15]); apc_store('password',$hash,15); $hit = 'miss'; } $end = microtime(); $time = abs(round($start - $end,2)); echo "[$time] -> ($hit) $hash\n"; APCu
  59. OPCache

  60. None
  61. Not really a cache

  62. ✓Full-text search engine ✓Analytics engine ✓NoSQL database ✓Lucene based ✓Built-in

    clustering, replication, sharding ✓RESTful interface ✓JSON output ✓Schemaless ElasticSearch
  63. Fast retrieval Fast search All REST

  64. POST/blog/post/6160 { "language": "en-US", "title": "WordPress 4.4 is available! And

    these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" }
  65. GET /blog/post/6160 { "_index": "blog", "_type": "post", "_id": "6160", "_version":

    1, "found": true, "_source": { "language": "en-US", "title": "WordPress 4.4 is available! And these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" } } Retrieve document by id Document & meta data
  66. POST /blog/post/_search { "fields": ["title"], "query": { "match": { "title":

    "working" } } }
  67. What can we cache (reminder) Byte code Database output External

    services Files from disk Pages
  68. Byte code Byte code Database output External services Files from

    disk Pages
  69. OPCache

  70. 1.Read file from disk 2.Tokenize 3.Compile into bytecode 4.Execute Byte

    code caching
  71. Read file from disk Tokenize Compile into bytecode 1.Read bytecode

    from shared memory 2.Execute Byte code caching
  72. Files from disk Byte code Database output External services Files

    from disk Pages
  73. A RAMDisk could solve that problem

  74. Or put (some of) that data in Redis

  75. Or maybe even ElasticSearch

  76. External services Byte code Database output External services Files from

    disk Pages
  77. Potential slow down

  78. { "disclaimer": "Exchange rates provided for informational purposes only and

    do not constitute financial advice of any kind. Although every attempt is made to ensure quality, no guarantees are made of accuracy, validity, availability, or fitness for any purpose. All usage subject to acceptance of Terms: https://openexchangerates.org/ terms/", "license": "Data sourced from various providers; resale prohibited; no warranties given of any kind. All usage subject to License Agreement: https://openexchangerates.org/license/", "timestamp": 1463137208, "base": "USD", "rates": { "AED": 3.67297, "AFN": 68.589998, "ALL": 121.4755, "AMD": 479.452503, "ANG": 1.78875, "AOA": 165.784832, "ARS": 14.19349, "AUD": 1.372985, "AWG": 1.793333, https://openexchangerates.org/api/latest.json?app_id=123
  79. <?php require ‘vendor/autoload.php'; $predis = new Predis\Client(); $rates = $predis->hgetall(‘rates');

    if(count($rates) == 0) { $client = new GuzzleHttp\Client(); $response= $client->get('https://openexchangerates.org/api/latest.json?app_id=123'); $data = json_decode($response->getBody()->getContents()); $reflect = new ReflectionObject($data->rates); foreach($reflect->getProperties(ReflectionProperty::IS_PUBLIC) as $property) { $rates[$property->getName()] = $property->getValue($data->rates); } $predis->hmset('rates',$rates); $predis->expire('rates',15); } echo $rates['EUR'].PHP_EOL; Caching external services
  80. <?php require ‘vendor/autoload.php'; $predis = new Predis\Client(); $rates = $predis->hgetall(‘rates');

    if(count($rates) == 0) { $client = new GuzzleHttp\Client(); $response= $client->get('https://openexchangerates.org/api/latest.json?app_id=123'); $data = json_decode($response->getBody()->getContents()); $reflect = new ReflectionObject($data->rates); foreach($reflect->getProperties(ReflectionProperty::IS_PUBLIC) as $property) { $rates[$property->getName()] = $property->getValue($data->rates); } $predis->hmset('rates',$rates); $predis->expire('rates',15); } echo $rates['EUR'].PHP_EOL; Caching external services
  81. Flexibility $rates = $predis->hgetall('rates'); $rates = $predis->hget('rates', 'EUR');

  82. Database output Byte code Database output External services Files from

    disk Pages
  83. Potential slow down

  84. <?php require 'vendor/autoload.php'; $predis = new Predis\Client(); $productSkus = $predis->smembers('products');

    $products = $predis->pipeline(function($pipe) use ($productSkus){ foreach($productSkus as $sku) { $pipe->hgetall($sku); } }); if(count($productSkus) == 0) { $db = new PDO('mysql:host=localhost;dbname=sample', 'root', ''); $statement = $db->query('SELECT sku,name,short_description,price FROM catalog_product_flat_1'); $products = $statement->fetchAll(PDO::FETCH_ASSOC); $productSkus = []; foreach($products as $row) { $productSkus[] = $row['sku']; $predis->hmset($row['sku'],$row); $predis->sadd('products',$row['sku']); $predis->expire('products',15); $predis->expire($row['sku'],15); } } foreach($products as $product) { echo $product['sku'] . ' '.$product['name'].PHP_EOL; } Caching database output
  85. <?php require 'vendor/autoload.php'; $predis = new Predis\Client(); $productSkus = $predis->smembers('products');

    $products = $predis->pipeline(function($pipe) use ($productSkus){ foreach($productSkus as $sku) { $pipe->hgetall($sku); } }); if(count($productSkus) == 0) { $db = new PDO('mysql:host=localhost;dbname=sample', 'root', ''); $statement = $db->query('SELECT sku,name,short_description,price FROM catalog_product_flat_1'); $products = $statement->fetchAll(PDO::FETCH_ASSOC); $productSkus = []; foreach($products as $row) { $productSkus[] = $row['sku']; $predis->hmset($row['sku'],$row); $predis->sadd('products',$row['sku']); $predis->expire('products',15); $predis->expire($row['sku'],15); } } foreach($products as $product) { echo $product['sku'] . ' '.$product['name'].PHP_EOL; } Caching database output
  86. Let’s try this with ElasticSearch

  87. <?php require 'vendor/autoload.php'; $client = Elasticsearch\ClientBuilder::create()->build(); $params = [ 'index'

    => 'products', 'type' => 'product', 'body' => [ 'size'=>10000, 'query' => [ 'match_all' => [ ] ] ] ]; $response = $client->search($params); if(!isset($response['hits']['hits']) || count($response['hits']['hits']) == 0) { $db = new PDO('mysql:host=localhost;dbname=sample', 'root', ''); $statement = $db->query('SELECT sku,name,short_description,price FROM products'); $products = $statement->fetchAll(PDO::FETCH_ASSOC); foreach($products as $row) { $client->index([ 'index' => 'products', 'type' => 'product', 'id' => $row['sku'], 'body' => $row ]); } } else { $products = array_map(function($doc){ return $doc['_source']; },$response['hits']['hits']); } foreach($products as $product) { echo $product['sku'] . ' '.$product['name'].PHP_EOL; }
  88. <?php require 'vendor/autoload.php'; $client = Elasticsearch\ClientBuilder::create()->build(); $params = [ 'index'

    => 'products', 'type' => 'product', 'body' => [ 'size'=>10000, 'query' => [ 'match_all' => [ ] ] ] ]; $response = $client->search($params); if(!isset($response['hits']['hits']) || count($response['hits']['hits']) == 0) { $db = new PDO('mysql:host=localhost;dbname=sample', 'root', ''); $statement = $db->query('SELECT sku,name,short_description,price FROM products'); $products = $statement->fetchAll(PDO::FETCH_ASSOC); foreach($products as $row) { $client->index([ 'index' => 'products', 'type' => 'product', 'id' => $row['sku'], 'body' => $row ]); } } else { $products = array_map(function($doc){ return $doc['_source']; },$response['hits']['hits']); } foreach($products as $product) { echo $product['sku'] . ' '.$product['name'].PHP_EOL; }
  89. Pages Byte code Database output External services Files from disk

    Pages
  90. all the way

  91. Easy peasy right?

  92. Not really

  93. There are rules

  94. ✓Only GET & HEAD ✓No authorization headers ✓No cookies ✓No

    set-cookies ✓Valid cache-control/expires headers When does Varnish cache? Some rules …
  95. It’s all about state

  96. Lots of developers don’t follow those rules

  97. Cookies everywhere

  98. Cache-control quoi?

  99. Out of the box These are the main reasons why

    Varnish will not work
  100. Write VCL

  101. vcl 4.0; backend default { .host = "127.0.0.1"; .port =

    "8080"; } Minimal VCL
  102. Normalize

  103. vcl 4.0; import std; sub vcl_recv { set req.http.Host =

    regsub(req.http.Host, ":[0-9]+", ""); set req.url = std.querysort(req.url); if (req.url ~ "\#") { set req.url = regsub(req.url, "\#.*$", ""); } if (req.url ~ "\?$") { set req.url = regsub(req.url, "\?$", ""); } if (req.restarts == 0) { if (req.http.Accept-Encoding) { if (req.http.User-Agent ~ "MSIE 6") { unset req.http.Accept-Encoding; } elsif (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip"; } elsif (req.http.Accept-Encoding ~ "deflate") { set req.http.Accept-Encoding = "deflate"; } else { unset req.http.Accept-Encoding; } } } } Normalize
  104. Static assets

  105. vcl 4.0; sub vcl_recv { if (req.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg| jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|otf|ogg|ogm|opus|pdf|png|ppt|pptx|rar|rtf|

    svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip) (\?.*)?$") { unset req.http.Cookie; return (hash); } } sub vcl_backend_response { if (bereq.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico| jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|otf|ogg|ogm|opus|pdf|png|ppt|pptx|rar| rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip) (\?.*)?$") { unset beresp.http.set-cookie; } if (bereq.url ~ "^[^?]*\.(7z|avi|bz2|flac|flv|gz|mka|mkv|mov|mp3|mp4|mpeg|mpg|ogg|ogm| opus|rar|tar|tgz|tbz|txz|wav|webm|xz|zip)(\?.*)?$") { unset beresp.http.set-cookie; set beresp.do_stream = true; set beresp.do_gzip = false; } } Cache static assets
  106. Do you really want to cache static assets?

  107. Nginx or Apache can be fast enough for that

  108. Memory consumption vs Speed improvement

  109. vcl 4.0; import std; sub vcl_recv { if (req.url ~

    "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot| flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg| mpg|odt|otf|ogg|ogm|opus|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf| tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz| zip)(\?.*)?$") { unset req.http.Cookie; return (pass); } } Don’t cache static assets
  110. URL whitelist/blacklist

  111. sub vcl_recv { if (req.url ~ "^/status\.php$" || req.url ~

    "^/update\.php$" || req.url ~ "^/admin$" || req.url ~ "^/admin/.*$" || req.url ~ "^/user$" || req.url ~ "^/user/.*$" || req.url ~ "^/flag/.*$" || req.url ~ "^.*/ajax/.*$" || req.url ~ "^.*/ahah/.*$") { return (pass); } } URL blacklist
  112. sub vcl_recv { if (req.url ~ "^/products/?" return (hash); }

    } URL whitelist
  113. Those damn cookies again!

  114. vcl 4.0; sub vcl_recv { set req.http.Cookie = regsuball(req.http.Cookie, "has_js=[^;]+(;

    )?", ""); set req.http.Cookie = regsuball(req.http.Cookie, "__utm.=[^;]+(; )?", ""); set req.http.Cookie = regsuball(req.http.Cookie, "_ga=[^;]+(; )?", ""); set req.http.Cookie = regsuball(req.http.Cookie, "_gat=[^;]+(; )?", ""); set req.http.Cookie = regsuball(req.http.Cookie, "utmctr=[^;]+(; )?", ""); set req.http.Cookie = regsuball(req.http.Cookie, "utmcmd.=[^;]+(; )?", ""); set req.http.Cookie = regsuball(req.http.Cookie, "utmccn.=[^;]+(; )?", ""); set req.http.Cookie = regsuball(req.http.Cookie, "__gads=[^;]+(; )?", ""); set req.http.Cookie = regsuball(req.http.Cookie, "__qc.=[^;]+(; )?", ""); set req.http.Cookie = regsuball(req.http.Cookie, "__atuv.=[^;]+(; )?", ""); set req.http.Cookie = regsuball(req.http.Cookie, "^;\s*", ""); if (req.http.cookie ~ "^\s*$") { unset req.http.cookie; } } Remove tracking cookies
  115. vcl 4.0; sub vcl_recv { if (req.http.Cookie) { set req.http.Cookie

    = ";" + req.http.Cookie; set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";"); set req.http.Cookie = regsuball(req.http.Cookie, ";(PHPSESSID)=", "; \1="); set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", ""); set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", ""); if (req.http.cookie ~ "^\s*$") { unset req.http.cookie; } } } Only keep session cookie
  116. sub vcl_recv { if (req.http.Cookie) { set req.http.Cookie = ";"

    + req.http.Cookie; set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";"); set req.http.Cookie = regsuball(req.http.Cookie, ";(language)=", "; \1="); set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", ""); set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", ""); if (req.http.cookie ~ "^\s*$") { unset req.http.cookie; return(pass); } return(hash); } } sub vcl_hash { hash_data(regsub( req.http.Cookie, "^.*language=([^;]*);*.*$", "\1" )); } Language cookie cache variation
  117. Alternative language cache variation

  118. sub vcl_hash { hash_data(req.http.Accept-Language); } Language cookie cache variation Or

    just send a “Vary:Accept-Language” header
  119. sub vcl_hash { hash_data(req.http.Cookie); } Hash all cookies

  120. Edge Side Includes

  121. header.php menu.php main.php footer.php TTL 5s No caching TTL 10s

    TTL 2s
  122. sub vcl_recv { set req.http.Surrogate-Capability = "key=ESI/1.0"; } sub vcl_backend_response

    { if (beresp.http.Surrogate-Control ~ "ESI/1.0") { unset beresp.http.Surrogate-Control; set beresp.do_esi = true; } } Edge Side Includes
  123. <?php header("Cache-Control: public,must-revalidate,s-maxage=10"); echo "Date in the ESI tag: ".date('Y-m-d

    H:i:s').'<br />'; <?php header("Cache-Control: no-store"); header(“Surrogate-Control: content='ESI/1.0'"); echo '<esi:include src="/esi.php" />'.PHP_EOL; echo "Date in the main page: ".date('Y-m-d H:i:s').'<br />'; Main page ESI frame: esi.php Cached for 10 seconds Not cached
  124. ESI vs AJAX

  125. Control Time To Live

  126. sub vcl_backend_response { set beresp.ttl = 3h; } Control Time

    To Live
  127. sub vcl_backend_response { if (beresp.ttl <= 0s || beresp.http.Set-Cookie ||

    beresp.http.Vary == "*") { set beresp.ttl = 120s; set beresp.uncacheable = true; return (deliver); } } Control Time To Live
  128. Debugging

  129. sub vcl_deliver { if (obj.hits > 0) { set resp.http.X-Cache

    = "HIT"; } else { set resp.http.X-Cache = "MISS"; } } Debugging
  130. Purging

  131. acl purge { "localhost"; "127.0.0.1"; "::1"; } sub vcl_recv {

    if (req.method == "PURGE") { if (!client.ip ~ purge) { return (synth(405, “Not allowed.”)); } return (purge); } } Purging
  132. acl purge { "localhost"; "127.0.0.1"; "::1"; } sub vcl_backend_response {

    set beresp.http.x-url = bereq.url; set beresp.http.x-host = bereq.http.host; } sub vcl_deliver { unset resp.http.x-url; unset resp.http.x-host; } sub vcl_recv { if (req.method == "PURGE") { if (!client.ip ~ purge) { return (synth(405, "Not allowed")); } if(req.http.x-purge-regex) { ban("obj.http.x-host == " + req.http.host + " && obj.http.x-url ~ " + req.http.x-purge-regex); } else { ban("obj.http.x-host == " + req.http.host + " && obj.http.x-url == " + req.url); } return (synth(200, "Purged")); } } Banning
  133. Banning curl -XPURGE -H "x-purge-regex:/products" "http://example.com" curl -XPURGE "http://example.com/products"

  134. Grace mode

  135. sub vcl_backend_response { set beresp.grace = 6h; } Grace mode

  136. Let’s talk more about

  137. GET /blog/post/6160 { "_index": "blog", "_type": "post", "_id": "6160", "_version":

    1, "found": true, "_source": { "language": "en-US", "title": "WordPress 4.4 is available! And these are the new features…", "date": "Tue, 15 Dec 2015 13:28:23 +0000", "author": "Romy", "category": [ "News", "PHP", "Sector news", "Webdesign & development", "CMS", "content management system", "wordpress", "WordPress 4.4" ], "guid": "6160" } } Remember this one?
  138. GET /blog/_mapping { "blog": { "mappings": { "post": { "properties":

    { "author": { "type": "string" }, "category": { "type": "string" }, "date": { "type": "string" }, "guid": { "type": "string" }, "language": { "type": "string" }, "title": { "type": "string" } } } } } } Schemaless? Not really … “Guesses” mapping on insert
  139. Explicit mapping

  140. POST /blog { "mappings" : { "post" : { "properties":

    { "title" : { "type" : "string" }, "date" : { "type" : "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "author": { "type": "string" }, "category": { "type": "string" }, "guid": { "type": "integer" } } } } } Explicit mapping at index creation time
  141. POST /blog { "mappings": { "post": { "properties": { "author":

    { "type": "string", "index": "not_analyzed" }, "category": { "type": "string", "index": "not_analyzed" }, "date": { "type": "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "guid": { "type": "integer" }, "language": { "type": "string", "index": "not_analyzed" }, "title": { "type": "string", "fields": { "en": { "type": "string", "analyzer": "english" }, "nl": { "type": "string", "analyzer": "dutch" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } Alternative mapping
  142. POST /blog { "mappings": { "post": { "properties": { "author":

    { "type": "string", "index": "not_analyzed" }, "category": { "type": "string", "index": "not_analyzed" }, "date": { "type": "date", "format": "E, dd MMM YYYY HH:mm:ss Z" }, "guid": { "type": "integer" }, "language": { "type": "string", "index": "not_analyzed" }, "title": { "type": "string", "fields": { "en": { "type": "string", "analyzer": "english" }, "nl": { "type": "string", "analyzer": "dutch" }, "raw": { "type": "string", "index": "not_analyzed" } } } } } } } What’s with the analyzers?
  143. Analyzed vs non-analyzed

  144. Full-text vs exact value

  145. By default strings are analyzed … unless you mention it

    in the mapping
  146. Analyzer •Character filters •Tokenizers •Token filters Replaces characters for analyzed

    text Break text down into terms Add/modify/ delete tokens
  147. Built-in analyzers •Standard •Simple •Whitespace •Stop •Keyword •Pattern •Language •Snowball

    •Custom Standard tokenizer Lowercase token filter English stop word token filter
  148. Hey man, how are you doing? hey man how are

    you doing Hey man, how are you doing? hei man how you do English Whitespace Standard
  149. POST /blog/post/_search { "fields": ["title"], "query": { "match": { "title":

    "working" } } }
  150. { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 1, "max_score": 1.7562683, "hits": [ { "_index": "blog", "_type": "post", "_id": "2742", "_score": 1.7562683, "fields": { "title": [ "Hosted SharePoint 2010: working efficiently as a team" ] } } ] } }
  151. POST /blog/post/_search { "fields": ["title"], "query": { "match": { "title.en":

    "working" } } }
  152. { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful":

    5, "failed": 0 }, "hits": { "total": 6, "max_score": 2.4509864, "hits": [ { "_index": "blog", "_type": "post", "_id": "828", "_score": 2.4509864, "fields": { "title": [ "Still a lot of work in store" ] } }, { "_index": "blog", "_type": "post", "_id": "3873", "_score": 2.144613, "fields": { "title": [ "SSL: what is it and how does it work?" ] } }, { "_index": "blog", "_type": "post", "_id": "5586", "_score": 2.1184452, "fields": { "title": [ "WebAssembly: several world players work on a faster Internet" ]
  153. Search

  154. POST /blog/post/_count { "query": { "match": { "title": "PROXY protocol

    support in Varnish" } } } 162 posts POST /blog/post/_count { "query": { "filtered": { "filter": { "term": { "title.raw": "PROXY protocol support in Varnish" } } } } } 1 post
  155. Filter vs Query

  156. Match Query Multi Match Query Bool Query Boosting Query Common

    Terms Query Constant Score Query Dis Max Query Filtered Query Fuzzy Like This Query Fuzzy Like This Field Query Function Score Query Fuzzy Query GeoShape Query Has Child Query Has Parent Query Ids Query Indices Query Match All Query More Like This Query Nested Query Prefix Query Query String Query Simple Query String Query Range Query Regexp Query Span First Query Span Multi Term Query Span Near Query Span Not Query Span Or Query Span Term Query Term Query Terms Query Top Children Query Wildcard Query Minimum Should Match Multi Term Query Rewrite Template Query
  157. And Filter Bool Filter Exists Filter Geo Bounding Box Filter

    Geo Distance Filter Geo Distance Range Filter Geo Polygon Filter GeoShape Filter Geohash Cell Filter Has Child Filter Has Parent Filter Ids Filter Indices Filter Limit Filter Match All Filter Missing Filter Nested Filter Not Filter Or Filter Prefix Filter Query Filter Range Filter Regexp Filter Script Filter Term Filter Terms Filter Type Filter
  158. Aggregations

  159. Group by on steroids

  160. SELECT author, COUNT(guid) FROM blog.post GROUP BY author Metric Bucket

  161. POST /blog/post/_search?pretty&search_type=count { "aggs": { "popular_bloggers": { "terms": { "field":

    "author" } } } } Only aggs, no docs
  162. "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [

    { "key": "Romy", "doc_count": 415 }, { "key": "Combell", "doc_count": 184 }, { "key": "Tom", "doc_count": 184 }, { "key": "Jimmy Cappaert", "doc_count": 157 }, { "key": "Christophe", "doc_count": 23 } ] } } Aggregation output
  163. POST /blog/_search { "query": { "match": { "title": "varnish" }

    }, "aggs": { "popular_bloggers": { "terms": { "field": "author", "size": 10 }, "aggs": { "used_languages": { "terms": { "field": "language", "size": 10 } } } } } } Nested multi-group by alongside query
  164. "aggregations": { "popular_bloggers": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [

    { "key": "Romy", "doc_count": 4, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "en-US", "doc_count": 3 }, { "key": "nl-NL", "doc_count": 1 } ] } }, { "key": "Combell", "doc_count": 3, "used_languages": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "nl-NL", "doc_count": 3 } ] } }, Aggregation output
  165. Min Aggregation Max Aggregation Sum Aggregation Avg Aggregation Stats Aggregation

    Extended Stats Aggregation Value Count Aggregation Percentiles Aggregation Percentile Ranks Aggregation Cardinality Aggregation Geo Bounds Aggregation Top hits Aggregation Scripted Metric Aggregation Global Aggregation Filter Aggregation Filters Aggregation Missing Aggregation Nested Aggregation Reverse nested Aggregation Children Aggregation Terms Aggregation Significant Terms Aggregation Range Aggregation Date Range Aggregation IPv4 Range Aggregation Histogram Aggregation Date Histogram Aggregation Geo Distance Aggregation GeoHash grid Aggregation
  166. Where does all of this fit in?

  167. ✓Cache all images, js, css, woff, … ✓Cache dynamic pages

    ✓ESI or AJAX for user-specific content ✓Sanitize HTTP input/output ✓Gateway to your application Where does Varnish fit in?
  168. ✓Secondary database (NoSQL) ✓RDBMS can remain the source of truth

    ✓Store in fixed format (no joins) ✓Full-text search ✓Fast retrieval of data projections ✓Aggregations Where does ElasticSearch fit in?
  169. ✓Real-time information ✓Key-value gets, not searches ✓Volatile data ✓When data

    changes a lot ✓RDMBS is still source of truth Where does Redis fit in?
  170. None
  171. https://blog.feryn.eu https://talks.feryn.eu https://youtube.com/thijsferyn https://soundcloud.com/thijsferyn https://twitter.com/thijsferyn http://itunes.feryn.eu