Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Security & Chaos Engineering: A Novel Approach to Crafting Secure and Resilient Distributed Systems

Security & Chaos Engineering: A Novel Approach to Crafting Secure and Resilient Distributed Systems

Modern systems pose a number of thorny challenges and securing the transformation from legacy monolithic systems to distributed systems demands a change in mindset and engineering toolkit. The security engineering toolkit is unfortunately out-of-style and outdated with today's approach to building, security and operating distributed systems. The speed, scale, and complex operations within microservice architectures make them tremendously difficult for humans to mentally model their behavior. If the latter is even remotely true how is it possible to adequately secure services that are not even fully comprehended by the engineering teams that built them. Security Chaos Engineering helps teams realign the actual state of operational security as well as build confidence that their security actually works the way the think it does. Chaos Engineering allows for security teams to proactively experiment on recurring incident patterns to derive new information about underlying factors that were previously unknown by reversing the postmortem and preparation phases. This is done by developing live fire exercises that can be measured, managed, and automated. It develops teams by building a learning culture around system failure to challenge engineering teams to proactively, safely discover system weakness before they disrupt business outcomes. In this session we will introduce a new concept known as Security Chaos Engineering and how it can be applied to create highly secure, performant, and resilient distributed systems.

Aaron Rinehart

October 16, 2020
Tweet

Other Decks in Technology

Transcript

  1. @aaronrinehart @verica_io #chaosengineering • Combating Complexity in Software • Chaos

    Engineering • Resilience Engineering & Security • Security Chaos Engineering Areas Covered
  2. 3 Aaron Rinehart CTO & Co-Founder • Former Chief Security

    Architect @UnitedHealth • Former DoD, NASA Safety & Reliability Engineering • Frequent speaker and author on Chaos Engineering & Security • O’Reilly Author: Chaos Engineering, Security Chaos Engineering Books • Pioneer behind Security Chaos Engineering • Led ChaoSlingr team at UnitedHealth @aaronrinehart @verica_io #chaosengineering
  3. “The growth of complexity in society has got ahead of

    our understanding of how complex systems work and fail” -Sydney Dekker 10
  4. Circuit Breaker Patterns 13 Continuous Delivery Distributed Systems Blue/Green Deployments

    Cloud Computing Service Mesh Containers Immutable Infrastructure Infracode Continuous Integration Microservice Architectures API Auto Canaries CI/CD DevOps Automation Pipelines Complex?
  5. Mostly Monolithic Requires Domain Knowledge Prevention focused Poorly Aligned Defense

    in Depth Stateful in nature DevSecOps not widely adopted Security? Expert Systems Adversary Focused
  6. “As the complexity of a system increases, the accuracy of

    any single agent’s own model of that system decreases” - Dr. David Woods Woods Theorem:
  7. After a few months…. Hard Coded Passwords Identity Conflicts Lead

    Software Engineering finds a new job at Google New Security Tool Refactor Pricing 300 Microservices Δ-> 850 Microservices Cloud Provider API Outage WAF Outage -> Disabled Scalability Issues Network is Unreliable Autoscaling Keeps Breaking Large Customer Delayed Features DNS Resolution Errors Expired Certificate Regulatory Audit Rolling Sev1 Outage on Portal Code Freeze
  8. Years?…. Hard Coded Passwords Identity Conflicts Lead Software Engineering finds

    a new job at Google New Security Tool Refactor Pricing 300 Microservices Δ-> 4000 Microservices Cloud Provider API Outage Firewall Outage -> Disabled Scalability Issues Network is Unreliable Autoscaling Keeps Breaking Large Customer Outage Delayed Features DNS Resolution Errors Expired Certificate Regulatory Audit Rolling Sev1 Outages on Portal Code Freeze Hard Coded Passwords Identity Conflicts Lead Software Engineering finds a new job at Google New Security Tool Refactor Pricing 300 Microservices Δ-> 850 Microservices Cloud Provider API Outage WAF Outage -> Disabled Scalability Issues Network is Unreliable Autoscaling Keeps Breaking Large Customer Outage Delayed Features DNS Resolution Errors Expired Certificate Regulatory Audit Rolling Sev1 Outage on Portal Merger with competitor Misconfigured FW Rule Outage Database Outage Portal Retry Storm Outage Orphaned Documentation Corporate Reorg Budget Freeze Outsource overseas development Exposed Secrets on GithuCode Freeze b Migration to New CSP Upgrade to Java SE 12
  9. “things that have never happened before happen all the time”

    –Scott Sagan “The Limits of Safety”
  10. Security incidents are not effective measures of detection because at

    that point it's already too late 32 Security Incidents
  11. “Chaos Engineering is the discipline of experimenting on a distributed

    system in order to build confidence in the system’s ability to withstand turbulent conditions” Chaos Engineering
  12. • Define steady state • Formulate hypothesis • Outline methodology

    • Identify blast radius • Observability is key • Readily abortable Chaos Monkey Story • During Business Hours • Born out of Netflix Cloud Transformation • Put well defined problems in front of engineers. • Terminate VMs on Random VPC Instances
  13. • Formulate hypothesis • Outline methodology • Identify blast radius

    • Observability is key • Readily abortable Chaos Pitfalls:Breaking things on Purpose “I'm pretty sure I won’t have a job very long if I break things on purpose all day.” -Casey Rosenthal The purpose of Chaos Engineering is NOT to “Break Things on Purpose”. If anything we are trying to “Fix them on Purpose”! Reference: Nora Jones 8 Traps of Chaos Engineering
  14. “It worked in Star Wars but it won’t work here”

    Hope is Not an Effective Strategy
  15. WE OFTEN MISREMEMBER WHAT OUR SYSTEMS REALLY ARE, AND AS

    A RESUL T THE OPPORTUNITY FOR ACCIDENTS & MISTAKES INCREASES
  16. We really don’t know very much No matter how much

    we prepare... Security Incidents are Subjective Where? Why? Who? What? How? 55
  17. • ChatOps Integration • Configuration-as-Code • Example Code & Open

    Framework ChaoSlingr Product Features • Serverless App in AWS • 100% Native AWS • Configurable Operational Mode & Frequency • Opt-In | Opt-Out Model
  18. Hypothesis: If someone accidentally or maliciously introduced a misconfigured port

    then we would immediately detect, block, and alert on the event. Alert SOC? Config Mgmt? Misconfigured Port Injection IR Triage Log data? Wait... Firewall?
  19. Alert SOC? Config Mgmt? Misconfigured Port Injection IR Triage Log

    data? Wait... Firewall? Experimentation Opportunities
  20. verica.io/book VERICA | CONTINUOUS VERIFICATION Get your copy of the

    O’Reilly Chaos Engineering Book Free Copy Compliments of Verica.io