Upgrade to Pro — share decks privately, control downloads, hide ads and more …

International PHP Conference - Fall 2012 : Varnish Cache

International PHP Conference - Fall 2012 : Varnish Cache

An introduction to varnish cache for PHP developers.

Mike Willbanks

October 15, 2012
Tweet

More Decks by Mike Willbanks

Other Decks in Programming

Transcript

  1. Housekeeping… •  Talk –  Slides will be posted after the

    talk. •  Me –  Sr. Web Architect Manager at NOOK Developer –  Prior MNPHP Organizer –  Open Source Contributor –  Where you can find me: •  Twitter: mwillbanks G+: Mike Willbanks •  IRC (freenode): mwillbanks Blog: http://blog.digitalstruct.com •  GitHub: https://github.com/mwillbanks
  2. Agenda •  Varnish? •  The Good : Getting Started • 

    The Awesome : General Usage •  The Crazy : Advanced Usage •  Gotchas
  3. Official Statement “Varnish is a web application accelerator. You install

    it in front of your web application and it will speed it up significantly.”
  4. A Scenario •  System Status Server – Mobile apps check current

    status. – If the system is down do we communicate? – If there are problems do we communicate? – The apps and mobile site rely on an API •  Trouble in paradise? Few and far in between.
  5. The Graph - AWS 0 10000 20000 30000 40000 50000

    60000 70000 80000 Small X-Large Small Varnish Requests Requests 0 50 100 150 200 250 300 350 400 450 500 Small X-Large Small Varnish Time Time 0 2 4 6 8 10 12 14 Small X-Large Small Varnish Peak Load Peak Load 0 100 200 300 400 500 600 700 Small X-Large Small Varnish Req/s Req/s
  6. The Raw Data Small   X-­‐Large   Small  Varnish  

    Concurrency   10   150   150   Requests   5000   55558   75000   Time   438   347   36   Req/s   11.42   58   585   Peak  Load   11.91   8.44   0.35   Comments   19,442  failed   requests  
  7. LAMP + Varnish * Varnish can act as a load

    balancer. HTTP Server Cluster Database Varnish Cache Load Balancer Cache Hit Yes No
  8. Installation rpm --nosignature -i http://repo.varnish-cache.org/redhat/varnish-3.0/el5/ noarch/varnish-release-3.0-1.noarch.rpm yum install varnish curl

    http://repo.varnish-cache.org/debian/GPG-key.txt | sudo apt-key add - echo "deb http://repo.varnish-cache.org/ubuntu/ lucid varnish-3.0" | sudo tee -a /etc/apt/sources.list sudo apt-get update sudo apt-get install varnish git clone git://git.varnish-cache.org/varnish-cache cd varnish-cache sh autogen.sh ./configure make && make install
  9. Varnish Daemon •  varnishd –  -a address[:port] listen for client

    –  -b address[:port] backend requests –  -T address[:port] administration http –  -s type[,options] storage type (malloc, file, persistence) –  -P /path/to/file PID file –  Many others; these are generally the most important. Generally the defaults will do with just modification of the default VCL (more on it later).
  10. •  varnishd -a :80 \ -T localhost:6082 \ -f /path/to/default.vcl

    \ -s malloc,512mb •  Web server to listen on port 8080 General Configuration
  11. So what’s actually caching? •  Any requests containing – GET /

    HEAD – TTL > 0 •  What cause it to miss? – Cookies – Authentication Headers – Vary “*” – Cache-control: private
  12. Request Response vcl_recv req. vcl_hash vcl_hit vcl_miss vcl_fetch vcl_deliver req.

    req. obj. resp. req. bereq. beresp. req. bereq. vcl_pipe req. bereq. vcl_pass req. bereq.
  13. HTTP Caching •  RFC 2616 HTTP/1.1 Headers – Expiration •  Cache-Control

    •  Expires – Validation •  Last Modified •  If-Modified-Since •  ETag •  If-None-Match
  14. Use Wordpress? backend default { .host = "127.0.0.1“; .port =

    "8080"; } sub vcl_recv { if (!(req.url ~ "wp-(login|admin)")) { unset req.http.cookie; } } sub vcl_fetch { if (!(req.url ~ "wp-(login|admin)")) { unset beresp.http.set-cookie; } }
  15. Varnish Configuration Language •  VCL State Engine –  Each Request

    is Processed Separately & Independently –  States are Isolated but are Related –  Return statements exit one state and start another –  VCL defaults are ALWAYS appended below your own VCL •  VCL can be complex, but… –  Two main subroutines; vcl_recv and vcl_fetch –  Common actions: pass, hit_for_pass, lookup, pipe, deliver –  Common variables: req, beresp and obj –  More subroutines, functions and complexity can arise dependent on condition.
  16. Request Response vcl_recv req. vcl_hash vcl_hit vcl_miss vcl_fetch vcl_deliver req.

    req. obj. resp. req. bereq. beresp. req. bereq. vcl_pipe req. bereq. vcl_pass req. bereq.
  17. VCL - Process VCL Process Description vcl_init Startup routine (VCL

    loaded, VMOD init) vcl_recv Beginning of request, req is in scope vcl_pipe Client & backend data passed unaltered vcl_pass Request goes to backend and not cached vcl_hash Creates cache hash, call hash_data for custom hashes vcl_hit Called when hash found in cache vcl_miss Called when hash not found in cache vcl_fetch Called to fetch data from backend vcl_deliver Called prior to delivery of response (excluding pipe) vcl_error Called when an error occurs vcl_fini Shutdown routine (VCL unload, VMOD cleanup)
  18. VCL – Variables •  Always Available –  now – epoch

    time •  Backend Declarations –  .host – hostname / IP –  .port – port number •  Request Processing –  client – ip & identity –  server – ip & port –  req – request information •  Backend –  bereq – backend request –  beresp – backend response •  Cached Object –  obj – Cached object, can only change .ttl •  Response –  resp – response information
  19. VCL - Functions VCL Function Description hash_data(string) Adds a string

    to the hash input regsub(string, regex, sub) Substitution on first occurrence regsuball(string, regex, sub) Substitution on all occurrences ban(expression) Ban all items that match expression ban(regex) Ban all items that match regular expression
  20. DEFAULT VCL Walking through the noteworthy items. Request Response vcl_recv

    req. vcl_hash vcl_hit vcl_miss vcl_fetch vcl_deliver req. req. obj. resp. req. bereq. beresp. req. bereq. vcl_pipe req. bereq. vcl_pass req. bereq.
  21. vcl_recv •  Received Request •  Only GET & HEAD by

    default – Safest way to cache! •  Will use HTTP cache headers. •  Cookies or Authentication Headers will bust out of the cache.
  22. vcl_hash •  Hash is what we look for in the

    cache. •  Default is URL + Host – Server IP used if host header was not set; in a load balanced environment ensure you set this header!
  23. vcl_fetch •  Fetch retrieves the response from the backend. • 

    No Cache if… – TTL is not set or not greater than 0. – Vary headers exist. – Hit-For-Pass means we will cache a pass through.
  24. Remove GA Cookies GA cookies will cause a miss; remove

    them prior to going to the backend.
  25. Directors – The Types Director Type Description Random Picks based

    on random and weight. Client Picks based on client identity. Hash Picks based on hash value. Round Robin Goes in order and starts over DNS Picks based on incoming DNS host, random OR round robin. Fallback Picks the first “healthy” server.
  26. Director - Probing •  Backend Probing •  Variables –  .url

    –  .request –  .window –  .threshold –  .intial –  .expected_response –  .interval –  .timeout
  27. Grace Mode Request already pending for update; serve grace content.

    Backend is unhealthy. Probes as seen earlier must be implemented.
  28. Saint Mode Backend may be sick for a particular piece

    of content Saint mode makes sure that the backend will not request the object again for a specific period of time.
  29. Purging •  The various ways of purging – varnishadm – command

    line utility – Sockets (port 6082) – HTTP – now that is the sexiness
  30. Purging Examples varnishadm -T 127.0.0.1:6082 purge req.url == "/foo/bar“ telnet

    localhost 6082 purge req.url == "/foo/bar telnet localhost 80 Response: Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. PURGE /foo/bar HTTP/1.0 Host: bacon.org curl –X PURGE http://bacon.org/foo/bar
  31. Distributed Purging •  curl multi-request (in php) •  Use a

    message queue –  Use workers to do the leg work for you •  You will need to store a list of servers “somewhere”
  32. Logging •  Many times people want to log the requests

    to a file – By default Varnish only stores these in shared memory. – Apache Style Logs •  varnishncsa –D –a –w log.txt – This will run as a daemon to log all of your requests on a separate thread.
  33. VERIFY YOUR VCL You likely want to ensure that your

    cache is: 1.  Working Properly 2.  Caching Effectively
  34. What is Varnish doing… Varnishtop will show you real time

    information on your system. •  Use -i to filter on specific tags. •  Use -x to exclude specific tags.
  35. ESI – Edge Side Includes •  ESI is a small

    markup language much like SSI (server side includes) to include fragments (or dynamic content for that matter). •  Think of it as replacing regions inside of a page as if you were using XHR (AJAX) but single threaded. •  Three Statements can be utilized. –  esi:include – Include a page –  esi:remove – Remove content –  <!-- esi --> - ESI disabled, execute normally
  36. ESI Diagram Page Content <esi:include src="header.php" /> B a c

    k e n d V a r n i s h Varnish detects ESI, requests from backend OR checks cached state.
  37. Using ESI •  In vcl_fetch, you must set ESI to

    be on – set beresp.do_esi = true; – Varnish refuses to parse content for ESI if it does not look like XML •  This is by default; so check varnishstat and varnishlog to ensure that it is functioning like normal.
  38. ESI Usage <html> <head><title>Rock it with ESI</title></head> <body> <header> <esi:include

    src=”header.php" /> </header> <section id="main">...</section> <footer></footer> </body> </html>
  39. Embedding C in VCL •  Before getting into VMOD; did

    you know you can embed C into the VCL for varnish? •  Want to do something crazy fast or leverage a C library for pre or post processing? •  I know… you’re thinking that’s useless.. – On to the example; and a good one from the Varnish WIKI!
  40. Embedded C for syslog C{ #include <syslog.h> }C sub vcl_something

    { C{ syslog(LOG_INFO, "Something happened at VCL line XX."); }C } # Example with using varnish variables C{ syslog(LOG_ERR, "Spurious response from backend: xid %s request %s %s \"%s\" %d \"%s\" \"%s\"", VRT_r_req_xid(sp), VRT_r_req_request(sp), VRT_GetHdr(sp, HDR_REQ, "\005host:"), VRT_r_req_url(sp), VRT_r_obj_status(sp), VRT_r_obj_response(sp), VRT_GetHdr(sp, HDR_OBJ, "\011Location:")); }C
  41. Varnish Modules / Extensions •  Taking VCL embedded C to

    the next level •  Allows you to extend varnish and create new functions •  You could link to libraries to provide additional functionality
  42. VMOD - std •  toupper •  tolower •  set_up_tos • 

    random •  log •  syslog •  fileread •  duration •  integer •  collect
  43. Management Console •  varnishadm –T localhost:6062 – vcl.list – see all

    loaded configuration – vcl.load – load new configuration – vcl.use – select configuration to use – vcl.discard – remove configuration
  44. Cache Warmup •  Need to warm up your cache before

    putting a sever in the queue or load test an environment? – varnishreplay –r log.txt
  45. QUESTIONS? These slides will be posted to SlideShare & SpeakerDeck.

    SpeakerDeck: http://speakerdeck.com/u/mwillbanks Slideshare: http://www.slideshare.net/mwillbanks Twitter: mwillbanks G+: Mike Willbanks IRC (freenode): mwillbanks Blog: http://blog.digitalstruct.com GitHub: https://github.com/mwillbanks