Slide 1

Slide 1 text

ACCELERATING WEB APPLICATIONS WITH VARNISH SAMANTHA QUIÑONES, 20 MAY 2014, PHP TEK

Slide 2

Slide 2 text

About Me @ieatkillerbees http://tembies.com Decade+ as a SW engineer at Visa International Lead PHP developer at POLITICO since March, 2013 Software engineer since 1996 working primarily with Python and PHP since 2005

Slide 3

Slide 3 text

In the beginning… The web was born as a platform for mostly static content Most content, once published, changes rarely Static content continues to dominate the web

Slide 4

Slide 4 text

What is static? Even though most content is static, changes must be available very rapidly Static content is cheap to deliver but expensive to maintain Static content is... static. Stale. Boring.

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

What is dynamic? Content is curated for each consumer Dynamic content is expensive to deliver but cheap to maintain. Dynamic content makes the web engaging.

Slide 7

Slide 7 text

An age-old question Computational vs Human resources Static vs Dynamic

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

“The computer is incredibly fast, accurate, and stupid. Man is unbelievably slow, inaccurate, and brilliant. The marriage of the two is a force beyond calculation.” –Leo Cherne

Slide 10

Slide 10 text

UNDERSTANDING CONTENT DELIVERY

Slide 11

Slide 11 text

Basic content-delivery via HTTP Files on disk Passive benefit from OS memory mapping and caching Prone to I/O blocking Nearly impossible to scale out

Slide 12

Slide 12 text

The request-response cycle

Slide 13

Slide 13 text

Dynamic Backends

Slide 14

Slide 14 text

The Relational DBMS High degree of data organization and integrity Common, well-supported interfaces SQL Designed to scale on mainframes and “enterprise” servers (read: more CPUs/engines, more RAM/real storage)

Slide 15

Slide 15 text

The Relational DMBS Data normalization Scales up, not out

Slide 16

Slide 16 text

The Relational DBMS SELECT DISTINCT t1.CU_ship_name1, t1.CU_ship_name2, t1.CU_email FROM customers AS t1 WHERE t1.CU_solicit=1 AND t1.CU_cdate >= 20100725000000 AND t1.CU_cdate <= 20100801000000 AND EXISTS( SELECT NULL FROM orders AS t2 INNER JOIN item AS t3 ON t2.O_ref = t3.I_oref INNER JOIN product AS t4 ON t3.I_pid = t4.P_id INNER JOIN ( SELECT C_id FROM category WHERE C_store_type = 2 ) AS t5 ON t4.P_cat = t5.C_id WHERE t1.CU_id = t2.O_cid);

Slide 17

Slide 17 text

How do we leverage the power of a DBMS to manage content while also building applications that are scalable, performant, and durable?

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

“The advantage of adding cache…is that they have the potential to… eliminate some interactions, improving efficiency, scalability, and user-perceived performance by reducing the average latency of a series of interactions.” –Roy Fielding

Slide 20

Slide 20 text

Caches Amortize the cost of expensive operations across many requests Vastly increase the performance of an application at the cost of responsiveness to change Caches can scale horizontally in front of a smaller DBMS implementation

Slide 21

Slide 21 text

Caches of all Flavors In-memory object caches Content-delivery networks (CDNs) Static pre-rendering (yeah, this happens) Key-value stores & Document Databases Caching proxies

Slide 22

Slide 22 text

Caches of all Flavors Memcached is among the most popular caching tools In-memory key-value store Operates on the predominant caching pattern

Slide 23

Slide 23 text

In-Memory Object Caches $cache = new CacheClient(['127.0.0.1:4242']); ! $widget = $cache->get('widget-1'); if (!$widget) { $widget = $database->find('widget-1'); $cache->set('widget-1', $widget, CACHE_EXPIRY); } ! $gizmo = new Gizmo(); $id = $database->save($gizmo); $cache->set($id, $gizmo, CACHE_EXPIRY);

Slide 24

Slide 24 text

In-Memory Object Caches Tools like memcached are developer-focused The onus is on developers to make sure that they are caching data efficiently Usually this means that absolutely everything gets cached Or, the caching platform is underutilized to the point of near- irrelevance.

Slide 25

Slide 25 text

Content Delivery Networks Akamai, Cloudflare, etc Fantastic for caching truly static content Extremely robust Largely invisible to developers Not at all flexible

Slide 26

Slide 26 text

Static Pre-Rendering Very annoying to maintain When combined with other methods, it can greatly reduce the “freshness” of content Hides extremely poor performance Extremely robust

Slide 27

Slide 27 text

Caching Reverse Proxies Almost like a “local” CDN Maintain a cache of content from the origin server(s) Invisible to developers Squid, Varnish, and other “web accelerators” fall in this category

Slide 28

Slide 28 text

All roads lead… We're trying to make our sites “faster”… But what does that mean? Most dynamic content is pretty static

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

THIS TALK COVERS MOSTLY VARNISH 3 NOTA BENE

Slide 31

Slide 31 text

The reality of media publishing Content is overwhelmingly static Yet extremely time-sensitive Slow delivery of content is a huge revenue risk Load is inconsistent and unpredictable Editorial and engineering requirements are rarely in sync Failure can be devastating

Slide 32

Slide 32 text

Case Study: A Norwegian Tabloid Verdens Gang is one of Norway's most popular newspapers Suffered the same problems of all media platforms Poul-Henning Kamp, a BSD core developer, was the lead developer and application architect for Verdens Gang As a kernel developer, Kamp has a particular set of skills that allowed him to approach this problem from a new angle

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

Memory & Storage In the olden days, there was a line of demarcation between primary and secondary storage In short, primary storage (RAM in modern computers) can be accessed directly by the CPU Secondary storage is accessed via an I/O channel or controller

Slide 36

Slide 36 text

Memory and Storage As early as the 1950s, computer scientists were experimenting with virtual memory. By the 1970s, virtual memory was common in commercial computers Virtual memory is an abstraction that allows secondary storage to extend primary storage The operating system cooperates with specialized hardware to manage the paging of data in and out of virtual memory.

Slide 37

Slide 37 text

Virtual Memory is a Cache In essence, virtual memory is a cache The operating system swaps data between high-speed primary storage and slower secondary storage based on factors like age and access frequency Commonly accessed data is kept warm and ready while rarely- needed data can be quickly retrieved when called for

Slide 38

Slide 38 text

Caching-Reverse Proxies Traditional caching reverse proxies allocate memory and fill it with objects Less-used objects are written to disk. Objects on disk are written to memory when requested Sounds familiar, right?

Slide 39

Slide 39 text

Varnish is different The operating system is already paging data between primary and secondary storage So why reinvent the wheel? Varnish gets out of the operating system's way and lets the kernel do what it's best at!

Slide 40

Slide 40 text

Varnish in a nutshell At start-up, Varnish allocates a big, empty chunk of memory Within that space, Varnish maintains a workspace with pointers to cached objects, headers, etc Varnish prioritizes worker threads by most recently used These factors combine to reduce overall memory ops

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Getting Started Open-source: http://www.varnish-cache.org Commercial: https://www.varnish-software.com/ DEB: apt-get install varnish RPM: Available from EPEL Open BSD: pkg_add varnish Source: git://git.varnish-cache.org/varnish-cache

Slide 43

Slide 43 text

Varnish Config Language Varnish has its own DSL, VCL Modeled after C (and allows for inline C) VCL allows us to modify requests and responses “in-flight” Configure “backends” and how to interact with them Compiled in to binary objects and loaded in to memory

Slide 44

Slide 44 text

Starting varnishd -f : Specify a VCL file to load -s : Configure the storage (malloc, file, etc) -T : Specify where Varnish should listen for admin connections -a : Specify where Varnish should listen for HTTP requests

Slide 45

Slide 45 text

Backends Backends are origin servers (Apache, NGINX, etc) that will be serving your content Varnish can proxy a single backend, a cluster of them, or multiple clusters of them The configuration of backends and request routing could fill an hour on its own

Slide 46

Slide 46 text

Backends backend default { .host = "127.0.0.1"; .port = "80"; }

Slide 47

Slide 47 text

Directors Logical clusters of backends that allow for load balancing and redundancy Monitor the health of individual backends Random directors route requests to origin servers randomly Client directors route based on the identity of the client Hash directors route based on the URL of the request Round-robin directors round robin requests

Slide 48

Slide 48 text

DNS Directors director originpool dns { .list = { .host_header = "origin.foo.com"; .port = "80"; .connect_timeout = 0.5s; "10.42.42.0"/24; } .ttl = 5m; }

Slide 49

Slide 49 text

Probes Used by Varnish determines if a backend is healthy or not Specifies a check interval, timeout, expected response, etc. Can be set as part of a backend definition, or standalone.

Slide 50

Slide 50 text

Probes probe healthcheck { .url = "/health.php"; .interval = 60s; .timeout = 0.3 s; .window = 8; .threshold = 3; .initial = 3; .expected_response = 200; } backend origin { .host = "origin.foo.com"; .port = "http"; .probe = healthcheck; }

Slide 51

Slide 51 text

ACLs Used to identify client addresses Can allow bypassing the proxy for local clients Restrict URLs to certain clients

Slide 52

Slide 52 text

ACLs acl admin { "localhost"; "10.42.42.42"; } sub vcl_recv { if (req.url ~ "^/admin") { if (client.ip ~ admin) { return(pass); } else { error 405 "Not allowed in admin area."; } } }

Slide 53

Slide 53 text

Hooks Hooks allow the execution of VCL at a number of pre-defined points in the request-response cycle. vcl_recv – Called after a request has been received vcl_pipe – Called when entering pipe mode vcl_pass – Called when entering pass mode vcl_hit – Called when an object is found in the cache vcl_miss – Called when an object is not found in the cache

Slide 54

Slide 54 text

Hooks vcl_fetch* – Called when a response has been received from an origin server vcl_deliver – Called before a cached object is returned to a client vcl_error – Called when an error happens

Slide 55

Slide 55 text

Hooks in Varnish 4 vcl_fetch has been replaced by: vcl_backend_fetch - Called before sending the backend request vcl_backend_response - Called after a response has been successfully retrieved from the backend

Slide 56

Slide 56 text

Grace Mode Allows varnish to serve an expired object while a fresh object is being generated by the backend. Doesn't require the user to pay a “first-hit” tax Avoids threads piling up Protects against stampeding popular resources Some users may get expired data even though fresh data is “technically” available.

Slide 57

Slide 57 text

Saint Mode Like grace mode, but more awesome. Allows Varnish to serve expired objects when we don't like what the backends are returning. 200 OK with no response body? Serve from cache 500 errors? Serve from cache Merged in to Grace Mode in Varnish 4

Slide 58

Slide 58 text

Saint Mode Receive request for expired resource Request resource from origin-1 Receive 503 Request resource from origin-2 Receive 503 Increase TTL for resource by 30 seconds and restart request

Slide 59

Slide 59 text

Managing Varnish Varnish has a simple command prompt accessible by telnetting to the configured port List loaded VCL files, load new VCL files, switch between active VCL files Get the status of backends“ban” URLs (force them to pass requests to a backend)

Slide 60

Slide 60 text

Purging Objects Varnish understands a nonstandard HTTP method “PURGE” PURGE /obsolete/resource Configurable via ACL Frees memory Next client request will refresh the content

Slide 61

Slide 61 text

Purging Objects sub vcl_recv { if (req.request == "PURGE") { if (!client.ip ~ purge) { error 405 "Not allowed."; } return (lookup); } } sub vcl_hit { if (req.request == "PURGE") { purge; error 200 "Purged."; } } sub vcl_miss { if (req.request == "PURGE") { purge; error 200 "Purged."; } }

Slide 62

Slide 62 text

Banning Objects Banning applies a filter to cached objects Another way to invalidate content Applies instantly Does NOT free up memory Large ban lists can degrade performance Understands regular expressions

Slide 63

Slide 63 text

Vary Special header used by the origin server to indicate one resource is a variation of another Variations are common for different encodings (gzip, deflate) Or based on user agent (but it’s important to normalize!)

Slide 64

Slide 64 text

Vary if (req.http.Accept-Encoding) { if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") { # No point in compressing these remove req.http.Accept-Encoding; } elsif (req.http.Accept-Encoding ~ "gzip") { set req.http.Accept-Encoding = "gzip"; } elsif (req.http.Accept-Encoding ~ "deflate") { set req.http.Accept-Encoding = "deflate"; } else { # unknown algorithm remove req.http.Accept-Encoding; } }

Slide 65

Slide 65 text

Edge-Side Includes Varnish supports a subset of ESI esi:include, esi:remove,

Slide 66

Slide 66 text

Replaying Traffic Takes a Varnish log and replays all of the traffic Powerful tool to quickly warm up an empty cache when starting new Varnish instances

Slide 67

Slide 67 text

Varnish is complex Varnish is an incredibly complex and powerful tool Varnish is a dynamic caching framework This has been an overview of its features.Install it, read the docs, play, experiment, and explore!

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

Varnish killed the memcached star Politico's PHP stack uses a distributed service-oriented architecture Nearly all operations are abstracted behind RESTful APIs REST services from MongoDB, MySQL, other internal and external APIs We don't use memcached We don't use anything LIKE memcached

Slide 70

Slide 70 text

Varnish killed the memcached star Our services network sits behind a cluster of Varnish instances. All calls, even internally, go through those Varnish clusters. Every service returns appropriate HTTP status codes, including a 502s, 503s, and 504s. Every service returns Cache-Control headers which instruct Varnish to set the TTLs for each resource

Slide 71

Slide 71 text

We consume a lot of stuff Stored content from our internal CMS Static assets in S3 Data feeds from other reporting agencies Every external dependency is abstracted behind an interface that we define and control

Slide 72

Slide 72 text

Our Request-Response Cycle

Slide 73

Slide 73 text

What to do when an external dependency fails? Option 1: Fail gracefully Option 2: Fail ungracefully Option 3: Don't fail, because you're a BEAST!

Slide 74

Slide 74 text

Invisible fault tolerance

Slide 75

Slide 75 text

Summary Varnish is powerful. It's simple to get started, and difficult to master Varnish has the potential to not only accelerate applications, but to simplify infrastructures

Slide 76

Slide 76 text

Varnish 4 Addendum VCL “Objects” Modularized backends & directors

Slide 77

Slide 77 text

Contact Info Twitter: @ieatkillerbees Blog: http://tembies.com Email: [email protected] Joind.in: https://joind.in/10648

Slide 78

Slide 78 text

Contact Info & Feedback Twitter: @ieatkillerbees Blog: http://tembies.com Email: [email protected] Joind.in: https://joind.in/10648