NGINX for High Availability

Slide 1

Slide 1 text

NGINX High Availability and Monitoring Introduced by Andrew Alexeev Presented by Owen Garrett Nginx, Inc.

Slide 2

Slide 2 text

About this webinar No one likes a broken website. Learn about some of the techniques that NGINX users employ to ensure that server failures are detected and worked around, so that you too can build large-scale, highly-available web services.

Slide 3

Slide 3 text

The cost of down-me

Slide 4

Slide 4 text

The causes of down-me “ Through 2015, 80% of outages impac-ng mission-‐ cri-cal services will be caused by people and process issues, and more than 50% of those outages will be caused by change/configura-on/release integra-on and hand-‐off issues. ” Configura-on Management for Virtual and Cloud Infrastructures Ronni J. Colville and George Spafford, Gartner Hardware failures, disasters People and Process

Slide 5

Slide 5 text

INTRODUCING NGINX…

Slide 6

Slide 6 text

What is NGINX? Internet N Web Server Serve content from disk Application Server FastCGI, uWSGI, Passenger… Proxy Caching, Load Balancing… HTTP trafﬁc þ Application Acceleration þ SSL and SPDY termination þ Performance Monitoring þ High Availability Advanced Features: þ Bandwidth Management þ Content-based Routing þ Request Manipulation þ Response Rewriting þ Authentication þ Video Delivery þ Mail Proxy þ GeoLocation

Slide 7

Slide 7 text

143,000,000 Websites NGINX Accelerates

Slide 8

Slide 8 text

22% Top 1 million websites 37% Top 1,000 websites

Slide 9

Slide 9 text

NGINX and NGINX Plus NGINX F/OSS nginx.org 3rd party modules Large community of >100 modules

Slide 10

Slide 10 text

NGINX and NGINX Plus NGINX F/OSS nginx.org 3rd party modules Large community of >100 modules NGINX Plus Advanced load balancing features Ease-‐of-‐management Commercial support

Slide 11

Slide 11 text

IMPROVING AVAILABILITY WITH NGINX

Slide 12

Slide 12 text

Quick review of load balancing server {! listen 80;! ! location / {! proxy_pass http://backend;! }! }! ! upstream backend {! server webserver1:80;! server webserver2:80; ! server webserver3:80;! server webserver4:80;! }! Internet N

Slide 13

Slide 13 text

Three NGINX Techniques for High Availability NGINX: Basic Error Checks NGINX Plus: Advanced Health Checks Live so_ware upgrades 1 2 3

Slide 14

Slide 14 text

1. Basic Error Checks •  Monitor transac-ons as they happen – Retry transac-ons that ‘fail’ where possible – Mark failed servers as dead

Slide 15

Slide 15 text

Basic Error Checks server {! listen 80;! ! location / {! proxy_pass http://backend;! proxy_next_upstream error timeout; # http_503..., off! }! }! ! upstream backend {! server webserver1:80 max_fails=1 fail_timeout=10s;! server webserver2:80 max_fails=1 fail_timeout=10s; ! server webserver3:80 max_fails=1 fail_timeout=10s;! server webserver4:80 max_fails=1 fail_timeout=10s;! }!

Slide 16

Slide 16 text

More sophis-cated retries server {! listen 80;! ! location / {! # On error/timeout, try the upstream group one more time! error_page 502 504 = @fallback; ! proxy_pass http://backend;! proxy_next_upstream off;! }! ! location @fallback {! proxy_pass http://backend;! proxy_next_upstream off;! }! }!

Slide 17

Slide 17 text

2. Advanced Health Checks •  “Synthe-c Transac-ons” – Probes server health – Complex, custom tests are possible – Available in NGINX Plus

Slide 18

Slide 18 text

Advanced Health Checks server {! listen 80;! ! location / {! proxy_pass http://backend;! health_check;! }! }! ! upstream backend {! zone backend 64k;! server webserver1:80;! server webserver2:80; ! server webserver3:80;! server webserver4:80;! }! health_check: interval = period between checks fails = failure count before dead passes = pass count before alive uri = custom URI Default: 5 seconds, 1 fail, 1 pass, uri = /

Slide 19

Slide 19 text

Advanced usage server {! listen 80;! ! location / {! proxy_pass http://backend;! ! health_check uri=/test.php match=statusok;! proxy_set_header Host www.foo.com;! }! }! ! match statusok {! # Used for /test.php health check! status 200;! header Content-Type = text/html;! body ~ "Server[0-9]+ is alive";! }! Health checks inherit all parameters from loca-on block. match blocks deﬁne the success criteria for a health check

Slide 20

Slide 20 text

Edge cases – variables in conﬁgura-on server {! location / {! proxy_pass http://backend;! health_check;! proxy_set_header Host $host;! }! }! This may not work as expected. Remember – the health_check tests run in the context of the enclosing loca-on.

Slide 21

Slide 21 text

Edge cases – variables in conﬁgura-on server {! location / {! proxy_pass http://backend;! health_check;! proxy_set_header Host $host;! }! }! server {! location /internal-check {! internal;! proxy_pass http://backend;! health_check;! proxy_set_header Host www.foo.com;! }! }! This may not work as expected. Remember – the health_check tests run in the context of the enclosing loca-on. This is the common alterna-ve. Use a custom URI for the loca-on. Tag the loca-on as internal. Set headers manually. Useful for authen.ca.on.

Slide 22

Slide 22 text

Examples of using health checks •  Verify that pages don’t contain errors •  Run internal tests (e.g. test.php => DB connect) •  Managed removal of servers $ touch $DOCROOT/isactive.txt!

Slide 23

Slide 23 text

Advantages of ‘Health Checks’ •  Run tests asynchronously (find errors faster) •  Custom tests (not related to ‘real’ traffic) •  More flexibility to specify success/error

Slide 24

Slide 24 text

MORE NGINX PLUS FEATURES…

Slide 25

Slide 25 text

Slow start •  When basic error checks and advanced health checks recover: upstream backends {! zone backends 64k;! ! server webserver1 slow_start=30s;! }!

Slide 26

Slide 26 text

NGINX Plus status monitoring hkp://demo.nginx.com/ and hkp://demo.nginx.com/status Total data and connec-ons Current data and conns. Split per ‘server zone’ Cache sta-s-cs Upstream sta-s-cs: Traﬃc Health and Error status (web) (JSON)

Slide 27

Slide 27 text

3. Live so_ware upgrades •  Upgrade your NGINX binary on-‐the-‐ﬂy – No down-me – No dropped connec-ons

Slide 28

Slide 28 text

No down-me – ever! •  Reload conﬁgura-on with SIGHUP # nginx –s reload! •  Re-‐exec binary with copy-‐and-‐signal hkp://nginx.org/en/docs/control.html#upgrade NGINX parent process NGINX workers NGINX workers NGINX workers NGINX workers

Slide 29

Slide 29 text

In summary... Basic Error checks and retry logic On-‐the-‐ﬂy upgrades Advanced health checks + slow start Extended status monitoring NGINX F/OSS: NGINX Plus: Compared to other load balancers and ADCs, NGINX Plus is uniquely well-‐suited to a devops-‐driven environment.

Slide 30

Slide 30 text

Closing thoughts •  37% of the busiest websites use NGINX –  In most situa-ons, it’s a drop-‐in extension •  Check out the blogs on nginx.com •  Future webinars: nginx.com/webinars Try NGINX F/OSS (nginx.org) or NGINX Plus (nginx.com)