Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Facing Vulnerabilities in Spring Boot Architectures

Yury Nino
November 20, 2020

Facing Vulnerabilities in Spring Boot Architectures

Yury Nino

November 20, 2020
Tweet

More Decks by Yury Nino

Other Decks in Technology

Transcript

  1. • What could go wrong? • War Stories on PROD.

    • Stability: Antipatterns. • Resilience: Patterns. • Framework for Chaos GameDays. • Demo: Chaos Monkey for Spring Boot. Agenda
  2. Our bodies face all kinds of adversities: genetic mutations, toxic

    substances, attacks by (corona)viruses and bacteria and all a lot of diseases. In this dangerous world, how can they still be alive? What could go wrong? https://www.yurynino.dev/
  3. Our systems face all kinds of adversities: hard disks failures,

    network can go down, customer traffic can overload and a cyberattack can happen. In this chaotic world, how can they still be alive? What could go wrong? https://www.yurynino.dev/
  4. Netflix Twitter The infrastructure required by a software system can

    be as complex as the software itself. Every production failure is unique. No two incidents will share the precise chain of failure!
  5. Integration Points Every integration point will eventually fail in some

    way, and you need to be prepared for that failure. Defend with: • Circuit Breaker. • Timeouts. • Decoupling Middleware. • Handshaking www.yurynino.dev
  6. Chain Reactions A chain reaction happens because the death of

    one server makes the others pick up the slack. Defend with: • Defensive programming. • Circuit Breakers. • Bulkheads. www.yurynino.dev
  7. Cascading Failures A cascading failure occurs when cracks jump from

    one system to another, until the threads will be blocked forever. Defend with: • Defensive programming. • Circuit breakers. • Timeouts. www.yurynino.dev
  8. Users Users are a terrible thing. Users consume memory. Systems

    would be much better off with no users :) Users do weird things! • It is hard to predict what users will do. • Use AI techniques. • Run special stress tests to hammer deep links. www.yurynino.dev
  9. Blocked Threads Blocked threads used to be relate to all

    type of failures, including gradual slowdown and hung server. Defend with: • Always use timeouts, even though it needs • More error-handling code. www.yurynino.dev
  10. Self-Denial Attacks Self-denial attacks originate inside your own organization, when

    people cause self-inflicted wounds. Defend with: • Protect shared resources. • Avoid unexpected scaling effects. • Avoid front-end load causes increments in the back-end processing. www.yurynino.dev
  11. Scaling Effects If you have a many-to-few relationship, you can

    be hit by scaling effects when one side increases. Defend with: • Examine PROD vs QA environments to spot scaling effects. • Avoid point-to-point communication. • Avoid shared resources. www.yurynino.dev
  12. Unbalanced Capacities Mismatched ratios between different layers makes one tier

    to flood another with requests beyond its capacity. Defend with: • Examine server and thread counts. • Virtualize QA and scale it up. • Stress both sides of the interface. • Observe near Scaling Effects and users. www.yurynino.dev
  13. Dogpile A Dogpile concentrates demand and force you to spend

    too much to handle peak demand. Defend with: • Use random clock slews. • Don’t set all cron jobs for midnight. • Avoid pulsing increasing backoff times.
  14. Slow Responses Slow Responses happen when the response times exceed

    their own timeouts. Defend with: • Fail fast. • Send an immediate error response. • Probe your timeouts in many scenarios. www.yurynino.dev
  15. Unbounded Results What happens when the database suddenly returns five

    million rows instead of the usual hundred or so? Defend with: • Use realistic data volumes. • Paginate at the front-end. • Don’t rely on the data producers. • Put limits in your application-level protocols. www.yurynino.dev
  16. Chaos Engineering It is the discipline of experimenting failures in

    production in order to reveal their weakness and to build confidence in their resilience capability. https://principlesofchaos.org/
  17. Security Chaos Engineering It is the identification of security control

    failures through proactive experimentation to build confidence in the system’s ability to defend against malicious conditions in production. Security Chaos Engineering Book
  18. Chaos History 2008 Chaos Engineering was born at Netflix 2010

    Chaos Monkey & Simian Army were launched 2016 Gremlin was born 2019 Chaos Massification 2017 SRE USenix Chaos IQ ChaosConf 2018 Book Chaos Eng 2020 Book Chaos Eng
  19. Practicing Chaos GameDays Interactive, real-world and learning exercises. They are

    designed to give players a chance to put their skills in a technology to test. GameDays were created by Jesse Robbins inspired by his experience & training as a firefighter. Our Journey
  20. GameDays Framework Before After During • Pick a hypothesis. •

    Pick a style. • Decide who. • Decide where. • Decide when. • Document. • Get approval! • Detect the situation. • Take a deep breath. • Communicate. • Visit dashboards. • Analyze data. • Propose solutions. • Apply and solve! • Write a postmortem. • What Happened • Impact • Duration • Resolution Time • Resolution • Timeline • Action Items Russ Miles
  21. GameDays Framework Before After During • Pick a hypothesis. •

    Pick a style. • Decide who. • Decide where. • Decide when. • Document. • Get approval! • Detect the situation. • Take a deep breath. • Communicate. • Visit dashboards. • Analyze data. • Propose solutions. • Apply and solve! • Write a postmortem. • What Happened • Impact • Duration • Resolution Time • Resolution • Timeline • Action Items Evolve • Improve your method. • Integrate in pipelines. • Adjust metrics. • Validate CMM position. • Adapt next GameDay. • Continuous Verification.
  22. • Spring Boot • Chaos Monkey • Azure • Pulumi

    Gamedays Framework Before After During • Pick a hypothesis. • Pick a style. • Decide who. • Decide where. • Decide when. • Document. • Get approval! • Detect the situation. • Take a deep breath. • Communicate. • Visit dashboards. • Analyze data. • Propose solutions. • Apply and solve! • Write a postmortem. • What Happened • Impact • Duration • Resolution Time • Resolution • Timeline • Action Items Automate