Slide 1

Slide 1 text

NGINX High Availability and Monitoring Introduced by Andrew Alexeev Presented by Owen Garrett Nginx, Inc.

Slide 2

Slide 2 text

About this webinar No one likes a broken website. Learn about some of the techniques that NGINX users employ to ensure that server failures are detected and worked around, so that you too can build large-scale, highly-available web services.

Slide 3

Slide 3 text

The  cost  of  down-me  

Slide 4

Slide 4 text

The  causes  of  down-me   “  Through  2015,  80%  of  outages  impac-ng  mission-­‐ cri-cal  services  will  be  caused  by  people  and  process   issues,  and  more  than  50%  of  those  outages  will  be   caused  by  change/configura-on/release  integra-on   and  hand-­‐off  issues.  ”   Configura-on  Management  for  Virtual  and  Cloud   Infrastructures   Ronni  J.  Colville  and  George  Spafford,  Gartner   Hardware  failures,  disasters   People  and  Process  

Slide 5

Slide 5 text

INTRODUCING  NGINX…  

Slide 6

Slide 6 text

What  is  NGINX?   Internet N Web Server Serve content from disk Application Server FastCGI, uWSGI, Passenger… Proxy Caching, Load Balancing… HTTP traffic þ Application Acceleration þ SSL and SPDY termination þ Performance Monitoring þ High Availability Advanced Features: þ Bandwidth Management þ Content-based Routing þ Request Manipulation þ Response Rewriting þ Authentication þ Video Delivery þ Mail Proxy þ GeoLocation

Slide 7

Slide 7 text

143,000,000 Websites NGINX Accelerates

Slide 8

Slide 8 text

22% Top 1 million websites 37% Top 1,000 websites

Slide 9

Slide 9 text

NGINX  and  NGINX  Plus   NGINX  F/OSS     nginx.org   3rd  party     modules   Large  community  of  >100  modules  

Slide 10

Slide 10 text

NGINX  and  NGINX  Plus   NGINX  F/OSS     nginx.org   3rd  party     modules   Large  community  of  >100  modules   NGINX  Plus     Advanced  load  balancing  features   Ease-­‐of-­‐management   Commercial  support  

Slide 11

Slide 11 text

IMPROVING  AVAILABILITY  WITH  NGINX  

Slide 12

Slide 12 text

Quick  review  of  load  balancing   server {! listen 80;! ! location / {! proxy_pass http://backend;! }! }! ! upstream backend {! server webserver1:80;! server webserver2:80; ! server webserver3:80;! server webserver4:80;! }! Internet N

Slide 13

Slide 13 text

Three  NGINX  Techniques  for  High  Availability   NGINX:  Basic  Error  Checks     NGINX  Plus:  Advanced  Health  Checks     Live  so_ware  upgrades   1 2 3

Slide 14

Slide 14 text

1.  Basic  Error  Checks   •  Monitor  transac-ons  as  they  happen   – Retry  transac-ons  that  ‘fail’  where  possible   – Mark  failed  servers  as  dead  

Slide 15

Slide 15 text

Basic  Error  Checks   server {! listen 80;! ! location / {! proxy_pass http://backend;! proxy_next_upstream error timeout; # http_503..., off! }! }! ! upstream backend {! server webserver1:80 max_fails=1 fail_timeout=10s;! server webserver2:80 max_fails=1 fail_timeout=10s; ! server webserver3:80 max_fails=1 fail_timeout=10s;! server webserver4:80 max_fails=1 fail_timeout=10s;! }!

Slide 16

Slide 16 text

More  sophis-cated  retries   server {! listen 80;! ! location / {! # On error/timeout, try the upstream group one more time! error_page 502 504 = @fallback; ! proxy_pass http://backend;! proxy_next_upstream off;! }! ! location @fallback {! proxy_pass http://backend;! proxy_next_upstream off;! }! }!

Slide 17

Slide 17 text

2.  Advanced  Health  Checks   •  “Synthe-c  Transac-ons”   – Probes  server  health   – Complex,  custom  tests  are  possible   – Available  in  NGINX  Plus  

Slide 18

Slide 18 text

Advanced  Health  Checks   server {! listen 80;! ! location / {! proxy_pass http://backend;! health_check;! }! }! ! upstream backend {! zone backend 64k;! server webserver1:80;! server webserver2:80; ! server webserver3:80;! server webserver4:80;! }! health_check:    interval  =  period  between  checks    fails  =  failure  count  before  dead    passes  =  pass  count  before  alive    uri  =  custom  URI     Default:    5  seconds,  1  fail,  1  pass,  uri  =  /  

Slide 19

Slide 19 text

Advanced  usage   server {! listen 80;! ! location / {! proxy_pass http://backend;! ! health_check uri=/test.php match=statusok;! proxy_set_header Host www.foo.com;! }! }! ! match statusok {! # Used for /test.php health check! status 200;! header Content-Type = text/html;! body ~ "Server[0-9]+ is alive";! }! Health  checks  inherit  all   parameters  from  loca-on   block.     match  blocks  define  the   success  criteria  for  a   health  check    

Slide 20

Slide 20 text

Edge  cases  –  variables  in  configura-on   server {! location / {! proxy_pass http://backend;! health_check;! proxy_set_header Host $host;! }! }! This  may  not  work  as  expected.     Remember  –  the  health_check   tests  run  in  the  context  of  the   enclosing  loca-on.  

Slide 21

Slide 21 text

Edge  cases  –  variables  in  configura-on   server {! location / {! proxy_pass http://backend;! health_check;! proxy_set_header Host $host;! }! }! server {! location /internal-check {! internal;! proxy_pass http://backend;! health_check;! proxy_set_header Host www.foo.com;! }! }! This  may  not  work  as  expected.     Remember  –  the  health_check   tests  run  in  the  context  of  the   enclosing  loca-on.   This  is  the  common  alterna-ve.     Use  a  custom  URI  for  the  loca-on.   Tag  the  loca-on  as  internal.   Set  headers  manually.   Useful  for  authen.ca.on.  

Slide 22

Slide 22 text

Examples  of  using  health  checks   •  Verify  that  pages   don’t  contain  errors   •  Run  internal  tests  (e.g.  test.php  =>  DB  connect)   •  Managed  removal  of  servers    $ touch $DOCROOT/isactive.txt!

Slide 23

Slide 23 text

Advantages  of  ‘Health  Checks’   •  Run  tests  asynchronously  (find  errors  faster)   •  Custom  tests  (not  related  to  ‘real’  traffic)   •  More  flexibility  to  specify  success/error  

Slide 24

Slide 24 text

MORE  NGINX  PLUS  FEATURES…  

Slide 25

Slide 25 text

Slow  start   •  When  basic  error  checks  and  advanced  health   checks  recover:     upstream backends {! zone backends 64k;! ! server webserver1 slow_start=30s;! }!

Slide 26

Slide 26 text

NGINX  Plus  status  monitoring   hkp://demo.nginx.com/  and  hkp://demo.nginx.com/status   Total  data  and  connec-ons   Current  data  and  conns.     Split  per  ‘server  zone’     Cache  sta-s-cs     Upstream  sta-s-cs:    Traffic    Health  and  Error  status     (web)   (JSON)  

Slide 27

Slide 27 text

3.  Live  so_ware  upgrades   •  Upgrade  your  NGINX  binary  on-­‐the-­‐fly   – No  down-me   – No  dropped  connec-ons  

Slide 28

Slide 28 text

No  down-me  –  ever!   •  Reload  configura-on  with  SIGHUP      #  nginx –s reload! •  Re-­‐exec  binary  with  copy-­‐and-­‐signal    hkp://nginx.org/en/docs/control.html#upgrade   NGINX  parent  process   NGINX  workers   NGINX  workers   NGINX  workers   NGINX  workers  

Slide 29

Slide 29 text

In  summary...   Basic  Error  checks  and  retry  logic   On-­‐the-­‐fly  upgrades   Advanced  health  checks  +  slow  start   Extended  status  monitoring   NGINX  F/OSS:     NGINX  Plus:     Compared  to  other  load  balancers  and  ADCs,  NGINX  Plus  is  uniquely  well-­‐suited   to  a  devops-­‐driven  environment.  

Slide 30

Slide 30 text

Closing  thoughts   •  37%  of  the  busiest  websites  use  NGINX   –  In  most  situa-ons,  it’s  a  drop-­‐in  extension   •  Check  out  the  blogs  on  nginx.com   •  Future  webinars:  nginx.com/webinars   Try  NGINX  F/OSS  (nginx.org)  or  NGINX  Plus  (nginx.com)