[SCaLE16x] Silo-Based Architectures for High Availability Applications

[SCaLE16x] Silo-Based Architectures for High Availability Applications

High availability is becoming a de-facto requirement of today's applications. Customer-facing IT failures mean directly losing customer revenue and trust, as users have grown accustomed to easily switching service providers for more reliable ones. Lack of internal systems availability block employee productivity and add to the financial burden. Thus, it is critical to have a healthy, performant, resilient IT structure serving as a backbone of conducting your business. But there are no textbook solutions to achieving five 9s availability. Data redundancy, computing clusters, load balancing, fail-over mechanisms, each of these individually addresses one potential issue, but none treats systems in your organisation holistically for maximising business revenue.

Not everyone has the financial and technical ability to use the latest and greatest CDN and offload their high-availability requirements to such 3rd parties. This is where smartness comes into play, and my goal is to show you a different way of architecting an application, one that is centered around solving your own business needs without a huge additional cost. We have devised this solution while working on a very large US airline, using open-source technologies, to meed the Black Friday & Cyber Monday traffic requirements.

Silos are a clever method of grouping servers in such a way that they can be scaled both horizontally and vertically, depending on the actual application needs. Most importantly, it frees you from over-optimizing the architecture upfront, by allowing fine adjustments easy to integrate in your Agile workflow.

84cfe0e14cd3fdf8d1b2ef8223d99619?s=128

Georgiana Gligor

March 10, 2018
Tweet

Transcript

  1. FOR HIGH AVAILABILITY APPLICATIONS SILO-BASED ARCHITECTURES Georgiana Gligor / Tekkie

    Consulting / @gbtekkie
  2. @gbtekkie SCaLE 16X 2 ✤ Geek. Mother. Do-er. ✤ on

    LAMP/LEMP stack since 2003 ✤ Architecture / DevOps consultant ✤ RomaniaPHP Organizer ✤ PhD Student @gbtekkie gb@tekkie.ro GEORGIANA GLIGOR
  3. @gbtekkie SCaLE 16X 3 advantages and disadvantages silos: a possible

    approach the need for high availability what is high availability (HA)? AGENDA
  4. None
  5. @gbtekkie SCaLE 16X 5 https://youtu.be/MQm5BnhTBEQ

  6. 6 Software industry is built around anticipating change.

  7. 7 anticipate accommodate vs

  8. TYPICAL APPLICATION

  9. @gbtekkie SCaLE 16X 9

  10. None
  11. @gbtekkie SCaLE 16X master Frontend Business Logic Frontend Frontend Browser

    internet Load balancer slave reads writes 11 ADJUSTING
  12. @gbtekkie SCaLE 16X master Frontend Business Logic Frontend Frontend Browser

    internet Load balancer slave reads writes 12 ADJUSTING redundancy
  13. @gbtekkie SCaLE 16X master Frontend Business Logic Frontend Frontend Browser

    internet Load balancer slave reads writes 13 ADJUSTING resilience
  14. @gbtekkie SCaLE 16X 14 TYPICAL LAYERING

  15. @gbtekkie SCaLE 16X 15 APPLICATION ARCHITECTURE

  16. HIGH AVAILABILITY

  17. @gbtekkie SCaLE 16X 17 Ability to access the system: ✤

    retrieve information ✤ alter information ✤ send new data AVAILABILITY
  18. https:/ /flic.kr/p/dkasBz

  19. @gbtekkie SCaLE 16X 19 THE 9s DANCE Uptime Downtime (per

    year) 90.000 % 36.50 days one nine 99.000 % 3.65 days two nines 99.900 % 8.76 hrs three nines 99.950 % 4 hrs 23 mins 99.990 % 52.56 mins four nines 99.999 % 5.26 mins five nines
  20. @gbtekkie SCaLE 16X 20 THE 9s DANCE Uptime Downtime (per

    year) 90.000 % 36.50 days 99.000 % 3.65 days 99.900 % 8.76 hrs 99.950 % 4 hrs 23 mins Amazon SLA 99.990 % 52.56 mins four nines 99.999 % 5.26 mins five nines
  21. @gbtekkie SCaLE 16X 21 IMPACT $ 144,000 / hour 3600

    $ 40 / sec * =
  22. @gbtekkie SCaLE 16X 22 USER BEHAVIOUR amazon facebook youtube Alexa

    Rank 6 3 2 daily time on site 12:07 mins 19:27 mins 23:44 mins daily pageviews / visitor 11.83 9.38 12.84 bounce rate 21 % 29 % 33 %
  23. @gbtekkie SCaLE 16X 23 HIGH AVAILABILITY TRIANGLE cost complexity risk

  24. @gbtekkie SCaLE 16X 24 DOWNTIME scheduled ‣ you unscheduled ‣

    you ‣ others
  25. @gbtekkie SCaLE 16X 25 HAPPENS TO THE BEST

  26. @gbtekkie SCaLE 16X 26 MICHAEL JACKSON

  27. H.A. SYSTEM CHARACTERISTICS

  28. https://flic.kr/p/quMmFw NO SINGLE POINT OF FAILURE

  29. https://flic.kr/p/RLKw8z RELIABLE CROSSOVER

  30. DETECT FAILURES AS THEY OCCUR

  31. @gbtekkie SCaLE 16X 31 HA BEST PRACTICES 1. no single

    points of failure 2. stateless application design 3. automate infrastructure for consistency & reliability 4. clever monitoring and alerting 5. geographically distribute your machines 6. keep spare capacity to meet increasing demand
  32. 32 A man’s got to know his limitations. - Dirty

    Harry
  33. SILOS

  34. @gbtekkie SCaLE 16X 34 TRY UPGRADE TO PHP7

  35. @gbtekkie SCaLE 16X 35 WHAT IS A SILO? ✤ frontend

    (SPAs, PWAs, etc) ✤ backend (e.g. PHP services) ✤ data (including cache) 1 silo = full setup of servers that deliver the end-to-end functionality
  36. @gbtekkie SCaLE 16X 36 WHAT IS A SILO?

  37. @gbtekkie SCaLE 16X 37 SILO-BASED ARCHITECTURE

  38. @gbtekkie SCaLE 16X 38 MULTIPLE CACHES

  39. @gbtekkie SCaLE 16X 39 A/B TESTING

  40. @gbtekkie SCaLE 16X 40 GEOGRAPHICAL DISTRIBUTION

  41. @gbtekkie SCaLE 16X 41 LIVE UPGRADES

  42. @gbtekkie SCaLE 16X 42 ADVANTAGES ✤ reuse familiar technology ✤

    real A/B testing ✤ no BHUF requirements ✤ no disruption => brand loyalty ✤ lower Total Cost of Ownership ✤ simplify scalability
  43. @gbtekkie SCaLE 16X 43 DISADVANTAGES ✤ needs razor-sharp DevOps team

    ✤ small increase in hardware costs on kick-off ✤ adds complexity to the monitoring layer ✤ reconsider traceability ✤ different bug reproducing and hunting
  44. @gbtekkie SCaLE 16X 44 TAKEAWAYS

  45. @gbtekkie SCaLE 16X 45 ✤ build situational awareness with clever

    monitoring ✤ automate outage detection ✤ powerful A/B testing TAKEAWAYS
  46. @gbtekkie SCaLE 16X 46 FURTHER READING ✤ Wikipedia HA page

    ✤ OpenStack’s HA concepts ✤ Merge Hemo report from FDA ✤ USA Presidential Policy Directive 21 ✤ “Beyond Legacy Code” book ✤ TechCrunch’s summary of sites affected by Michael Jackson’s death ✤ Netflix lessons learned after AWS outage ✤ Netflix Chaos Monkey source code ✤ Brian Adler’s talk on “Architecting for High Availability and Multi-Cloud”
  47. ‹#› Questions? } Efficient architecture. Performance oriented. AI enhanced. dev@tekkie.ro