19 THE 9s DANCE Uptime Downtime (per year) 90.000 % 36.50 days one nine 99.000 % 3.65 days two nines 99.900 % 8.76 hrs three nines 99.950 % 4 hrs 23 mins 99.990 % 52.56 mins four nines 99.999 % 5.26 mins five nines
31 HA BEST PRACTICES 1. no single points of failure 2. stateless application design 3. automate infrastructure for consistency & reliability 4. clever monitoring and alerting 5. geographically distribute your machines 6. keep spare capacity to meet increasing demand
35 WHAT IS A SILO? ✤ frontend (SPAs, PWAs, etc) ✤ backend (e.g. PHP services) ✤ data (including cache) 1 silo = full setup of servers that deliver the end-to-end functionality
43 DISADVANTAGES ✤ needs razor-sharp DevOps team ✤ small increase in hardware costs on kick-off ✤ adds complexity to the monitoring layer ✤ reconsider traceability ✤ different bug reproducing and hunting
46 FURTHER READING ✤ Wikipedia HA page ✤ OpenStack’s HA concepts ✤ Merge Hemo report from FDA ✤ USA Presidential Policy Directive 21 ✤ “Beyond Legacy Code” book ✤ TechCrunch’s summary of sites affected by Michael Jackson’s death ✤ Netflix lessons learned after AWS outage ✤ Netflix Chaos Monkey source code ✤ Brian Adler’s talk on “Architecting for High Availability and Multi-Cloud”