Getting Started with Varnish

Slide 1

Slide 1 text

Getting Started with Varnish php[world] 2014 - Tutorials Day Samantha Quiñones

Slide 2

Slide 2 text

@ieatkillerbees http://tembies.com

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

—Roy Fielding “The advantage of adding cache…is that they have the potential to… eliminate some interactions, improving efficiency, scalability, and user-perceived performance by reducing the average latency of a series of interactions.”

Slide 6

Slide 6 text

VM Setup ! (open port 6081)

Slide 7

Slide 7 text

What is Varnish • Web Application Accelerator • Caching Reverse Proxy • Written in C, initially built by Poul Henning Kamp • Open-source at http://www.varnish-cache.org • Supported at http://www.varnish-cache.com

Slide 8

Slide 8 text

Digital Media Publishing • Very low write volume • Cannot tolerate high write latency • Most resources are never updated after creation

Slide 9

Slide 9 text

Caching & Digital Media • Caching is great for digital media if… • Cache-busting is inexpensive • Management of “hot” and “cold” objects is efficient

Slide 10

Slide 10 text

Case Study: Verdens Gang • Verdens Gang is one of Norway's most popular newspapers • Suffered the same problems of all digital media platforms • Poul Henning Kamp, a BSD core developer, was the lead developer and application architect for Verdens Gang • As a kernel developer, Kamp has a particular set of skills that allowed him to approach this problem from a new angle

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Computer Storage • In the olden days, there was a line of demarcation between primary and secondary storage • In short, primary storage (RAM in modern computers) can be accessed directly by the CPU • Secondary storage is accessed via an I/O channel or controller

Slide 14

Slide 14 text

Virtual Memory Management • As early as the 1950s, computer scientists were experimenting with virtual memory. • By the 1970s, virtual memory was common in commercial computers • Virtual memory is an abstraction that allows secondary storage to extend primary storage • The operating system cooperates with specialized hardware to manage the paging of data in and out of virtual memory.

Slide 15

Slide 15 text

Format Time (s) Equivalent Distance Equivalent Time 1 CPU Cycle 0.3 ns 1 m (1 step) 1 second L1 Cache 0.9 ns 3 m (3 steps) 3 seconds Main Memory 120 ns 360 m (to the highway) 6 minutes SSD 50 µs 170 km (Richmond, VA) 2 days HDD 5 ms 13,000 km (Hong Kong) 5 months

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Virtual Memory is a Cache • In essence, virtual memory is a cache • The operating system swaps data between high-speed primary storage and slower secondary storage based on factors like age and access frequency • Commonly accessed data is kept “hot” and ready while rarely-needed data can be quickly retrieved when called for

Slide 18

Slide 18 text

Caching Reverse Proxies • CRPs work by retaining a copy of the data they proxy • Copies can be retained in memory or on disk • Copies have an expiration time (TTL) after which they are abandoned

Slide 19

Slide 19 text

Memory-Backed CRP • Traditional caching reverse proxies allocate memory and fill it with objects • Less-used objects are written to disk • Objects on disk are written to memory when requested • Sounds familiar, right?

Slide 20

Slide 20 text

Varnish’s Difference • Varnish allocates a heap of memory up front • Objects stored in that heap are managed by the OS • OS Virtual Memory Managers are very sophisticated • Why reinvent the wheel?

Slide 21

Slide 21 text

How Varnish Works • Varnish creates a “workspace” in its memory space • Workspace contains pointers to cached objects, headers, etc • Varnish prioritizes worker threads by most recently used • These factors combine to reduce overall disk & memory ops

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

varnishd • varnishd has two processes • manager runs as root and starts the child (which does all the work) • manager monitors child and restarts it if it fails • manager interacts with the varnish cli interface (varnishadm) • child runs with more limited permissions and handles traffic

Slide 24

Slide 24 text

varnishadm (varnish CLI) • Allows administrators to interact with a running varnish • Secured by PSK • Designed to be “scriptable”

Slide 25

Slide 25 text

Varnish Config Language • Configuration DSL that is translated to C and compiled • We do not “configure” Varnish so much as write policies for handling types of traffic

Slide 26

Slide 26 text

varnishlog • Provide access to logs stored in memory

Slide 27

Slide 27 text

varnishstats • Provides access to in-memory statistics (cache hit/miss rate, resource usage, etc)

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

Storage Backends • Malloc (memory-based) • File (disk-based)

Slide 30

Slide 30 text

Malloc Storage • Memory is allocated as startup in KB, Mb, Gb, or Tb • Defaults to unlimited • Overflows to swap • Extremely fast performance

Slide 31

Slide 31 text

File Storage • Space allocated in KB, Mb, Gb, Tb, or as a percentage of available space on the device • Defaults to 50% of available space

Slide 32

Slide 32 text

Transient Storage • Special storage space for short-lived objects • Defaults to an unlimited malloc • Threshold TTL is configurable (default: 10s)

Slide 33

Slide 33 text

Sizing • Understand the size of your “hot” dataset • Size of homepage (including images) + size of linked pages/objects • Cost to produce objects

Slide 34

Slide 34 text

First Steps

Slide 35

Slide 35 text

Installing Varnish • https://www.varnish-cache.org/installation/ubuntu

Slide 36

Slide 36 text

Installing Varnish # apt-get install apt-transport-https # curl https://repo.varnish-cache.org/ubuntu/GPG-key.txt | apt-key add - # echo "deb https://repo.varnish-cache.org/ubuntu/ precise varnish-4.0" >> /etc/apt/sources.list.d/varnish-cache.list # apt-get update # apt-get install varnish

Slide 37

Slide 37 text

Important Commands • service varnish restart — Stops and restarts Varnish. Clears all cache • service varnish reload — Reloads the currently active VCL • varnishadm vcl.load — Loads a VCL • varnishadm vcl.use — Makes VCL named active • varnishadm param.set — Sets parameters

Slide 38

Slide 38 text

Default Config Edit /etc/default/varnish ! DAEMON_OPTS="-a :6081 \ # Listen Address -T localhost:6082 \ # Manage Address -f /etc/varnish/default.vcl \ # Config File -S /etc/varnish/secret \ # PSK -s malloc,256m" # Storage Config

Slide 39

Slide 39 text

Default VCL Edit /etc/varnish/default.vcl ! # Default backend definition. Set this to point to your content server. backend default { .host = “127.0.0.1"; .port = "8080"; }

Slide 40

Slide 40 text

Default VCL Change to… ! backend server1 { .host = "varnish.tembies.com:8180"; }

Slide 41

Slide 41 text

Exercise: Install Varnish

Slide 42

Slide 42 text

Serving Requests

Slide 43

Slide 43 text

Varnish & Cookies • By default, varnish will not cache if the request has a Cookie header or if the response has a Set-Cookie header • NB: It is better to not cache content, or to cache multiple copies, than to deliver content to the wrong person.

Slide 44

Slide 44 text

Dealing with Cookies • If possible, strip any cookies you do not need. If there are none left, cache • Create url schemes based on whether cookies are needed or not • Never cache Set-Cookie

Slide 45

Slide 45 text

Stripping Cookies sub strip_req_cookies { if (req.url !~ "^/admin") { set req.http.X-Orig-Cookie = req.http.Cookie; unset req.http.Cookie; } } ! sub strip_client_cc_headers { if (req.http.cache-control) { unset req.http.cache-control; } } ! sub vcl_recv { call strip_req_cookies; call strip_client_cc_headers; }

Slide 46

Slide 46 text

Exercise: Making Requests

Slide 47

Slide 47 text

Varnish Log & Stats

Slide 48

Slide 48 text

Varnish’s Log • Log is stored in memory and streamed to connected log clients • No I/O overhead for logging • Logs can be viewed/filtered in real-time

Slide 49

Slide 49 text

# varnishlog … - VCL_call HASH - VCL_return lookup - Hit 2147483657 - VCL_call HIT - VCL_return deliver - RespProtocol HTTP/1.1 - RespStatus 200 - RespReason OK …

Slide 50

Slide 50 text

varnishlog • -b — Only show log lines from traffic going between Varnish and the backend servers. This will be useful when we want to optimize cache hit rates. • -c — Same as '-b' but for client side traffic. • -m tag: — Only list transactions where the tag matches a regular expression. If it matches you will get the whole transaction.

Slide 51

Slide 51 text

varnishstat • Window in to the health and performance of varnish • Hundreds of counters with current and running-average values

Slide 52

Slide 52 text

# varnishstat

Slide 53

Slide 53 text

Exercise: Inspecting Requests

Slide 54

Slide 54 text

HTTP Refresh

Slide 55

Slide 55 text

Hypertext Transfer Protocol • Current Version 1.1 (RFC 2616, 7230, 7231, 7232, 7233, 7234, & 7235) • Requests consist of a method, headers, and sometimes a body • Methods — GET, POST, HEAD, OPTIONS, PUT, DELETE, TRACE, or CONNECT • Responses consist of a status, headers, and sometimes a body • Many requests can be sent over a single connection

Slide 56

Slide 56 text

Request Format [method] [uri] HTTP/1.1 [: ] [: ] ! [body]

Slide 57

Slide 57 text

GET /cached HTTP/1.1 Host: localhost:6081 Accept: text/html,application/xhtml+xml,application/ xml;q=0.9,image/webp,*/*;q=0.8 Accept-Encoding: gzip, deflate, sdch Accept-Language: en-US,en;q=0.8 Cookie: _ga=GA1.1.344169523.1415468951 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.52 Safari/537.36 X-DevTools-Emulate-Network-Conditions-Client-Id: 49CA66DB-5415-4490-B960-962423964A4F

Slide 58

Slide 58 text

Response Format HTTP/1.1 [status code] [status message] [: ] [: ] ! [body]

Slide 59

Slide 59 text

HTTP/1.1 200 OK Accept-Ranges: bytes Age: 0 Cache-Control: max-age=30, public, s-maxage=30 Connection: keep-alive Content-Encoding: gzip Content-Type: text/html; charset=UTF-8 Date: Mon, 10 Nov 2014 02:24:58 GMT Server: Apache/2.4.10 (Ubuntu) Transfer-Encoding: chunked Vary: Accept-Encoding Via: 1.1 varnish-v4 X-Backend-Port: 6081 X-Powered-By: PHP/5.5.12-2ubuntu4 X-Varnish: 32772

Slide 60

Slide 60 text

Response Codes • 1xx — Informational • 2xx — Success • 3xx — Redirection • 4xx — Client Error • 5xx — Server Error

Slide 61

Slide 61 text

Cache Response Headers • Expires — Date & time after which a resource is considered stale • Last-Modified — Date & time when the resource was updated • Etag — “Entity tag,” a unique value for a resource’s contents. Usually a hash • Age — Number of seconds a resource has been in cache

Slide 62

Slide 62 text

Cache Request Headers • If-Modified-Since — Requests a fresh resource if the resource’s last modified date/time is more recent than the date/time specified • If-None-Match — If the resource’s Etag differs from the Etag specified, server should sent a fresh resource

Slide 63

Slide 63 text

Vary • Response header that indicates the response may differ based on the header specified. (i.e. Vary: User-Agent)

Slide 64

Slide 64 text

Cache-Control • Specified directives that control cache behavior • public — Any cache may cache the content • no-store — No cache should store the content • no-cache — Store the content, but don’t serve it without validating it • max-age — Number of seconds a cache may store content for • s-maxage — Same as max-age, but applies to public caches only • must-revalidate — Stale content cannot be served without validating it

Slide 65

Slide 65 text

Break: Questions?

Slide 66

Slide 66 text

VCL Building Blocks

Slide 67

Slide 67 text

Varnish Config Language • C-Derived Domain-Specific State Engine • Processes requests in isolation • return(action) exits one state and moves to the next • Default VCL is present beneath your code and is appended during compilation

Slide 68

Slide 68 text

VCL Syntax • Comments — //, #, /* */ • Subroutines — sub [name] { … } • Loops — Nope! • Termination — return() • Objects — Struct-like objects that map a name to a group of client addresses acl local { "localhost"; // myself "192.0.2.0"/24; // and everyone on the local network ! "192.0.2.23"; // except for the dialin router }

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

VCL Functions • regsub(, , _{) — Replace the first match of in with _{• regsuball(, , _{) — Replace all matches of in with _{• ban() — Invalidate all cached objects that match
• call() — Call a subroutine
• hash_data() — Adds data to the hash input. By default, Host and URL of the request are used
• new() — Creates a new object
• rollback() — Restore request headers
• synthetic() — Prepares a synthetic response
• return() — Terminate a subroutine}}}}

Slide 71

Slide 71 text

vcl_recv • Called at the start of a request after the request has been parsed. • Access to request object • Normalize input • Make backend routing decisions • Re-write client data • Manage caching policy • Access controls & security

Slide 72

Slide 72 text

vcl_recv - State Transitions • pass (→vcl_pass) — Bypass the cache, send request to the backend and return the response • pipe (→vcl_pipe) — Switch to a proxy-like mode • hash (→vcl_hash) — Attempt a cache lookup, possibly entering new data in the cache • synth — Generate a synthetic error response and abandons the request • purge (→vcl_hash→vcl_purge) — Purge the object and any variants

Slide 73

Slide 73 text

sub vcl_recv { if (req.method != "GET" && req.method != "HEAD" && req.method != "PUT" && req.method != "POST" && req.method != "TRACE" && req.method != "OPTIONS" && req.method != "DELETE") { /* Non-RFC2616 or CONNECT which is weird. */ return (pipe); } if (req.method != "GET" && req.method != "HEAD") { /* We only deal with GET and HEAD by default */ return (pass); } if (req.http.Authorization || req.http.Cookie) { /* Not cacheable by default */ return (pass); } return (hash); }

Slide 74

Slide 74 text

Request Object • req.backend_hint — Set backend to this if we attempt to fetch • req.hash_always_miss — (bool) Force a cache miss for this request. If set to true Varnish will disregard any existing objects and always (re)fetch from the backend • req.http.[header] — The corresponding HTTP header • req.method — The request type (e.g. "GET", "HEAD") • req.restarts — Count of how many times this request has been restarted • req.url — The requested URL • req.xid — Unique ID of this request

Slide 75

Slide 75 text

Exercise: Vary on OS

Slide 76

Slide 76 text

vcl_backend_response • Called after the response headers have been received from a backend • deliver (→vcl_deliver) — Deliver the response, possibly caching it • abandon — Abandons the request and returns an error • retry — Retries the backend request. When the number of retries exceeds max_retries, Varnish will return an error.

Slide 77

Slide 77 text

sub vcl_backend_response { if (beresp.ttl <= 0s || beresp.http.Set-Cookie || beresp.http.Surrogate-control ~ "no-store" || (!beresp.http.Surrogate-Control && beresp.http.Cache-Control ~ "no-cache|no-store|private") || beresp.http.Vary == "*") { /* * Mark as "Hit-For-Pass" for the next 2 minutes */ set beresp.ttl = 120s; set beresp.uncacheable = true; } return (deliver); }

Slide 78

Slide 78 text

Backend Response Object • beresp.backend.ip — IP of the backend this response was fetched from • beresp.backend.name — Name of the backend this response was fetched from • beresp.grace — Set to a period to enable grace • beresp.http.[HEADER] — The corresponding HTTP header • beresp.proto — The HTTP protocol version used the backend replied with • beresp.reason — The HTTP status message returned by the server • beresp.status — The HTTP status code returned by the server • beresp.storage_hint — Hint to Varnish that you want to save this object to a particular storage backend • beresp.ttl — The object's remaining time to live, in seconds. beresp.ttl is writable • beresp.uncacheable — (bool) Marks the response as uncacheable

Slide 79

Slide 79 text

Calculating TTL • The s-maxage variable in the Cache-Control response header • The max-age variable in the Cache-Control response header • The Expires response header • The default_ttl parameter. • Cached Statuses: 200, 203, 300, 301, 302, 307, 404, 410

Slide 80

Slide 80 text

Exercise: Force TTL for PNGs

Slide 81

Slide 81 text

vcl_hit • Called when a cache lookup is successful • deliver(→vcl_deliver) — Deliver the object. Control passes to vcl_deliver • synth(status code, reason) — Return the specified status code to the client and abandon the request. • restart — Restart the transaction

Slide 82

Slide 82 text

sub vcl_hit { if (obj.ttl >= 0s) { // A pure unadultered hit, deliver it return (deliver); } if (obj.ttl + obj.grace > 0s) { // Object is in grace, deliver it // Automatically triggers a background fetch return (deliver); } // fetch & deliver once we get the result return (fetch); }

Slide 83

Slide 83 text

vcl_miss • Called after a cache lookup if the requested document was not found in the cache • synth(status code, reason) — Return the specified status code to the client and abandon the request • pass (→vcl_pass) — Switch to pass mode • fetch (→vcl_backend_fetch) — Retrieve the requested object from the backend • restart — Restart the transaction sub vcl_miss { return (fetch); }

Slide 84

Slide 84 text

vcl_hash • Defines the unique characteristics of a request ! sub vcl_hash { hash_data(req.url); if (req.http.host) { hash_data(req.http.host); } else { hash_data(server.ip); } return (lookup); }

Slide 85

Slide 85 text

vcl_pass • Called upon entering pass mode. In this mode, the request is passed on to the backend, and the backend's response is passed on to the client, but is not entered into the cache • synth(status code, reason — Return the specified status code to the client and abandon the request • pass — Proceed with pass mode • restart — Restart the transaction

Slide 86

Slide 86 text

vcl_deliver • Called before a cached object is delivered to the client • deliver — Deliver the object to the client • restart — Restart the transaction

Slide 87

Slide 87 text

vcl_backend_fetch • Called before sending the backend request • fetch — Fetch the object from the backend. • abandon — Abandon the backend request and generates an error.

Slide 88

Slide 88 text

vcl_backend_error • This subroutine is called if we fail the backend fetch • deliver — Deliver the error • retry — Retry the backend transaction

Slide 89

Slide 89 text

Exercise: Add Hit/Miss Header

Slide 90

Slide 90 text

Expiring Cache

Slide 91

Slide 91 text

Cache Invalidation • Purging — Removing an object (and its variants) from the cache • Banning — Filter cached objects, preventing them from being served

Slide 92

Slide 92 text

Purging • Implement a special PURGE HTTP method acl purge { "localhost"; } ! sub vcl_recv { if (req.method == "PURGE") { if (!client.ip ~ purge) { return(synth(405,"Forbidden")); // Bail with an error } return (purge); } }

Slide 93

Slide 93 text

Banning • Bans act as filters on objects which tell Varnish not to return cached objects that meet certain criteria • Bans are checked when a cache hit is made • Bans can be set from CLI or with custom VCL • varnishadm ban req.req.url ~ “\.png$” bans all *.png files • Banned content remains in cache, memory is not freed

Slide 94

Slide 94 text

Exercise: Implement Purging

Slide 95

Slide 95 text

Directors & Backends

Slide 96

Slide 96 text

Backends backend default { .host = "varnish.tembies.com"; .port = "8180"; }

Slide 97

Slide 97 text

Multiple Backends backend default { .host = "varnish.tembies.com"; .port = "8180"; } backend varnishorg { .host = “www.varnish-cache.org“; .port = "80"; }

Slide 98

Slide 98 text

Hinting & Routing backend foo { .host = “foo.com"; .port = "80"; } backend bar { .host = “bar.com”; .port = "80"; } sub vcl_recv { if (req.http.host ~ "foo.com") { set req.backend_hint = foo; } elsif (req.http.host ~ "bar.com") { set req.backend_hint = bar; } }

Slide 99

Slide 99 text

Directors • Logical groupings of backends • Random or round-robin routing of requests • Set periodic health checks and manage health status of backends

Slide 100

Slide 100 text

probe health_check { .url = "/health"; .timeout = 1s; .interval = 5s; } ! backend server1 { .host = "varnish.tembies.com:8180"; .probe = health_check; } backend server2 { .host = "varnish.tembies.com:8181"; .probe = health_check; } backend server3 { .host = "varnish.tembies.com:8182"; .probe = health_check; }

Slide 101

Slide 101 text

import directors; ! sub vcl_init { new vdir = directors.round_robin(); vdir.add_backend(server1); vdir.add_backend(server2); vdir.add_backend(server3); } ! sub vcl_recv { #… call strip_req_headers; set req.backend_hint = vdir.backend(); }

Slide 102

Slide 102 text

Exercise: Create & Monitor Backends

Slide 103

Slide 103 text

Grace Mode

Slide 104

Slide 104 text

Grace Mode • “Backup” TTL set on objects that lets Varnish serve them even when they are stale, under certain circumstances • When “graced” content is served, Varnish automatically attempts to refresh it

Slide 105

Slide 105 text

No content

Slide 106

Slide 106 text

Exercise: Grace all objects for 2 minutes

Slide 107

Slide 107 text

Rescuing Requests

Slide 108

Slide 108 text

Retry • Returning “retry” from (nearly) anywhere starts the VCL state engine from the top, with any changes to the request saved • Varnish can make intelligent decisions about whether or not to serve questionable content

Slide 109

Slide 109 text

sub vcl_backend_response { if (beresp.status == 200 && beresp.http.content-length == 0) { return(retry) } }

Slide 110

Slide 110 text

Edge Side Includes

Slide 111

Slide 111 text

What is ESI? • Simple markup language that enables content composition • Allows the combination of cached and uncached resources in to a single whole • Varnish implements a small subset, esi:include and esi:remove

Slide 112

Slide 112 text

Exercise: Turn on ESI

Slide 113

Slide 113 text

VMODs & Embedded C

Slide 114

Slide 114 text

VMODs • Extensions that provide additional functionality, loaded in VCL with the “load” command • Varnish comes with the VMOD “std” which provides useful helper functions • https://www.varnish-cache.org/vmods

Slide 115

Slide 115 text

vmod_std • std.querysort(req.url) — Sorts the query string • std.healthy(backend) — Returns TRUE is a backend is healthy • strstr(stringA, stringB) — Returns the substring if the second string is a substring of the first string • man vmod_std

Slide 116

Slide 116 text

Embedded C • C{ #include }C • C code included in VCL is compiled with the VCL and dynamically linked to Varnish in memory. • Embedded C essentially becomes part of the Varnish process • If your code produces a segfault, Varnish will crash • Holy crap, don’t do this, why are you still reading this?

Slide 117

Slide 117 text

Further Reading • https://www.varnish-cache.org/docs/4.0/index.html • http://www.mobify.com/blog/beginners-guide-to-http-cache-headers/ • https://www.varnish-cache.org/trac/wiki/VCLExamples

Slide 118

Slide 118 text

Samantha Quiñones • [email protected] or [email protected] • @ieatkillerbees • http://tembies.com • Feedback: https://joind.in/11907

Slide 119

Slide 119 text

Image Credits • Antonis Arestis photo - © 2012 Selene Alexia Christodoulou - CC • Futurama - © 1999 Twentieth-Century Fox • Modern Warehouse - © Axisdaman - CC • Begging Maltese - © Ed Yourdon - CC • Mud Bricks - © Whiteghost.ink - CC • Tetly’s Beer - © Reedy - CC • Film Director & Crew - © AiClassEland - CC • Ballerina - © David R. Tribble - CC • Stampede - © Andy Docker - CC • Beachy Head - © Papa Lima Whiskey - CC • Nuclear Test - © US NNSA