Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Accelerating Web Applications with Varnish

Accelerating Web Applications with Varnish

The Varnish web accelerator has been the center of attention recently as more and more content providers realize its power and flexibility. More than just a simple caching proxy, Varnish lets us crack open the request - response cycle and take control of precisely how each and every request is served. Through practical examples, I'll demonstrate how websites of every size are using this powerful tool to maximize performance, improving the user's experience while eliminating back-end complexity. We'll explore the unique Varnish Control Language and discover how it can make your site delivery lighter, leaner, and lightning fast!

Samantha Quiñones

March 19, 2014
Tweet

More Decks by Samantha Quiñones

Other Decks in Technology

Transcript

  1. About me • @ieatkillerbees • http://tembies.com • Decade+ as a

    SW engineer at Visa International • Lead PHP developer at POLITICO since March, 2013 • Software engineer since 1996 working primarily with Python and PHP since 2005
  2. In the beginning... • The web was born as a

    platform for mostly static content • Most content, once published, changes rarely • Static content continues to dominate the web
  3. What is static? • Even though most content is static,

    changes must be available very rapidly • Static content is cheap to deliver but expensive to maintain • Static content is... static. Stale. Boring.
  4. Dynamic content • Content is curated for each consumer –

    Context – Security – User Agent – Geographical Region – Demographics – Business Priorities
  5. Dynamics of dynamic • Dynamic content is expensive to deliver

    but cheap to maintain. • Dynamic content makes the web engaging.
  6. Ye olde rdbms • High degree of data organization and

    integrity • Common, well-supported interfaces • SQL • Designed to scale on mainframes and “enterprise” servers
  7. Ye olde rdbms SELECT DISTINCT t1.CU_ship_name1, t1.CU_ship_name2, t1.CU_email FROM customers

    AS t1 WHERE t1.CU_solicit=1 AND t1.CU_cdate>= 20100725000000 AND t1.CU_cdate<= 20100801000000 AND EXISTS( SELECT NULL FROM orders AS t2 INNER JOIN item AS t3 ON t2.O_ref = t3.I_oref INNER JOIN product AS t4 ON t3.I_pid = t4.P_id INNER JOIN ( SELECT C_id FROM category WHERE C_store_type = 2 ) AS t5 ON t4.P_cat = t5.C_id WHERE t1.CU_id = t2.O_cid);
  8. Ye olde rdbms • How do we leverage the power

    of a DBMS to manage content while also building applications that are scalable, performant, and durable?
  9. Caches • Amortize the cost of expensive operations across many

    requests • Vastly increase the performance of an application at the cost of responsiveness to change • Caches can scale horizontally in front of a smaller DBMS implementation
  10. Caches of all flavors • In-memory object caches • Content-delivery

    networks (CDNs) • Static pre-rendering (yeah, this happens) • Key-value stores & Document Databases • Caching proxies
  11. Caches of all flavors • Memcached is among the most

    popular caching tools • In-memory key-value store • Operates on the predominant caching pattern
  12. In-memory object cache $cache = new CacheClient(['127.0.0.1:4242']); $widget = $cache->get('widget-1');

    if (!$widget) { $widget = $database->find('widget-1'); $cache->set('widget-1', $widget, CACHE_EXPIRY); } $gizmo = new Gizmo(); $id = $database->save($gizmo); $cache->set($id, $gizmo, CACHE_EXPIRY);
  13. In-memory object cache • Tools like memcached a developer-focused •

    The onus is on developers to make sure that they are caching data efficiently. • Usually this means that absolutely everything gets cached • Or, the caching platform is underutilized to the point of near-irrelevance.
  14. CDNs • Akamai, Cloudflare, etc • Fantastic for caching truly

    static content • Extremely robust • Largely invisible to developers • Not at all flexible
  15. Static pre-rendering • Very annoying to maintain • When combined

    with other methods, it can greatly reduce the “freshness” of content • Hides extremely poor performance • Extremely robust
  16. Caching reverse proxies • Almost like a “local” CDN •

    Maintain a cache of content from the origin server(s) • Invisible to developers • Squid, Varnish, and other “web accelerators” fall in this category
  17. Get to the Point • We're trying to make our

    sites “faster”... • But what does that mean? • Most dynamic content is pretty static
  18. The reality of media publishing • Content is overwhelmingly static

    • Yet extremely time-sensitive • Slow delivery of content is a huge revenue risk • Load is inconsistent and unpredictable • Editorial and engineering requirements are rarely in sync • Failure can be devastating
  19. Case study: A Norwegian Tabloid • Verdens Gang is one

    of Norway's most popular newspapers • Suffered the same problems of all media platforms • Poul-Henning Kamp, a BSD core developer, was the lead developer and application architect for Verdens Gang • As a kernel developer, Kamp has a particular set of skills that allowed him to approach this problem from a new angle
  20. Memory & storage • In the olden days, there was

    a line of demarcation between primary and secondary storage • In short, primary storage (RAM in modern computers) can be accessed addressed by the CPU • Secondary storage is accessed via an I/O channel or controller
  21. Memory & storage • As early as the 1950s, computer

    scientists were experimenting with virtual memory. • By the 1970s, virtual memory was common in commercial computers • Virtual memory is an abstraction that allows secondary storage to extend primary storage • The operating system (like, hmm, BSD) cooperates with specialized hardware to manage the paging of data in and out of virtual memory.
  22. Virtual memory is a cache • In essence, virtual memory

    is a cache • The operating system swaps data between high-speed primary storage and slower secondary storage based on factors like age and access frequency • Commonly accessed data is kept warm and ready while rarely-needed data can be quickly retrieved when called for
  23. Traditional crp • Traditional caching reverse proxies allocate memory and

    fill it with objects • Less-used objects are written to disk. • Objects on disk are written to memory when requested. • Sounds familiar, right?
  24. The varnish proposition • The operating system is already paging

    data between primary and secondary storage • So why reinvent the wheel? • Varnish gets out of the operating system's way and lets the kernel do what it's best at!
  25. Varnish in a nutshell • At start-up, Varnish allocates a

    big, empty chunk of memory • Within that space, Varnish maintains a workspace with pointers to cached objects, headers, etc • Varnish prioritizes worker threads by most recently used • These factors combine to reduce overall memory ops • Less memory ops = less I/O ops = MAXIMUM FASTNESS
  26. Getting started • Open-source: http://www.varnish-cache.org • Commercial: https://www.varnish-software.com/ • DEB:

    apt-get install varnish • RPM: Available from EPEL • Open BSD: pkg_add varnish • Source: git://git.varnish-cache.org/varnish-cache
  27. Varnish configuration language • Varnish has its own DSL, VCL

    • Modeled after C (and allows for inline C!) • VCL allows us to modify requests and responses “in-flight” • Configure “backends” and how to interact with them • Compiled in to binary objects and loaded in to memory
  28. Start up options • -f : Specify a VCL file

    to load • -s : Configure the storage (malloc, file, etc) • -T : Specify where Varnish should listen for admin connections • -a : Specify where Varnish should listen for HTTP requests
  29. backends • Backends are origin servers (Apache, NGINX, etc) that

    will be serving your content. • Varnish can proxy a single backend, a cluster of them, or multiple clusters of them • The configuration of backends and request routing could fill an hour on its own.
  30. directors • Logical clusters of backends that allow for load

    balancing and redundancy • Monitor the health of individual backends • Random directors route requests to origin servers randomly • Client directors route based on the identity of the client • Hash directors route based on the URL of the request • Round-robin directors round robin requests
  31. Directors director originpool dns { .list = { .host_header =

    “origin.foo.com"; .port = "80"; .connect_timeout = 0.5s; "10.42.42.0"/24; } .ttl = 5m; } • DNS directors operate like round-robin or random directors, but with a pool of hosts
  32. Probes • How Varnish determines if a backend is healthy

    or not • Specifies a check interval, timeout, expected response, etc. • Can be set as part of a backend definition, or standalone.
  33. probes probe healthcheck { .url = "/health.php"; .interval = 60s;

    .timeout = 0.3 s; .window = 8; .threshold = 3; .initial = 3; .expected_response = 200; } backend origin { .host = "origin.foo.com"; .port = "http"; .probe = healthcheck; }
  34. ACLs • Used to identify client addresses • Can allow

    bypassing the proxy for local clients • Restrict URLs to certain clients
  35. ACLs acl admin { "localhost"; "10.42.42.42"; } sub vcl_recv {

    if (req.url ~ "^/admin") { if (client.ip ~ admin) { return(pass); } else { error 405 "Not allowed in admin area."; } } }
  36. hooks • Hooks allow the execution of VCL at a

    number of pre-defined points in the request-response cycle. • vcl_recv – Called after a request has been received • vcl_pipe – Called when entering pipe mode • vcl_pass – Called when entering pass mode • vcl_hit – Called when an object is found in the cache • vcl_miss – Called when an object is not found in the cache
  37. Hooks • vcl_fetch – Called when a response has been

    received from an origin server • vcl_deliver – Called before a cached object is returned to a client • vcl_error – Called when an error happens
  38. Grace Mode • Allows varnish to serve an expired object

    while a fresh object is being generated by the backend. • Doesn't require the user to pay a “first-hit” tax • Avoids threads piling up • Protects against stampeding popular resources • Some users may get expired data even though fresh data is “technically” available.
  39. Saint Mode • Like grace mode, but more awesome. •

    Allows Varnish to serve expired objects when we don't like what the backends are returning. • 200 OK with no response body? Serve from cache • 500 errors? Serve from cache
  40. Saint mode Receive request for expired resource Request resource from

    origin-1 Receive 503 Request resource from origin-2 Receive 503 Increase TTL for resource by 30 seconds and restart request
  41. Managing varnish • Varnish has a simple command prompt accessible

    by telnetting to the configured port • List loaded VCL files, load new VCL files, switch between active VCL files • Get the status of backends • “ban” URLs (force them to pass requests to a backend)
  42. PURGING RESOURCES • Varnish understands a nonstandard HTTP method “PURGE”

    • PURGE /obsolete/resource • Configurable via ACL
  43. PURGING RESOURCES sub vcl_recv { if (req.request == "PURGE") {

    if (!client.ip ~ purge) { error 405 "Not allowed."; } return (lookup); } } sub vcl_hit { if (req.request == "PURGE") { purge; error 200 "Purged."; } } sub vcl_miss { if (req.request == "PURGE") { purge; error 200 "Purged.";
  44. VARNISH REPLAYS • Takes a Varnish log and replays all

    of the traffic • Powerful tool to quickly warm up an empty cache when starting new Varnish instances
  45. Varnish is complicated • Varnish is an incredibly complex and

    powerful tool • Varnish is a dynamic caching framework • This has been an overview of its features. • Install it, read the docs, play, experiment, and explore!
  46. Varnish killed the memcached star • Politico's PHP stack uses

    a distributed service-oriented architecture • Nearly all operations are abstracted behind RESTful APIs • REST services from MongoDB, MySQL, other internal and external APIs • We don't use memcached • We don't use anything LIKE memcached
  47. Varnish killed the memcached star • Our services network sits

    behind a cluster of Varnish instances. • All calls, even internally, go through those Varnish clusters. • Every service returns appropriate HTTP status codes, including a 502s, 503s, and 504s. • Every service returns Cache-Control headers help Varnish set the TTLs for each resource
  48. We Consume Lots of Stuff • Stored content from our

    internal CMS • Static assets in S3 • Rich media from third parties • Data feeds from other reporting agencies • Every external dependency is abstracted behind an interface that we define and control
  49. But What if that Provider Goes Kaput? • Option 1:

    Fail gracefully • Option 2: Fail ungracefully • Option 3: Don't fail, because you're a BEAST!
  50. summary • Varnish is powerful. It's simple to get started,

    and difficult to master • Varnish has the potential to not only accelerate applications, but to simplify infrastructures