Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Tenets of Resilient Systems

Tenets of Resilient Systems

Excerpt from workshop on resilient system design.

Michael Gasch

July 08, 2022
Tweet

More Decks by Michael Gasch

Other Decks in Technology

Transcript

  1. Confidential │ ©2020 VMware, Inc. 11 Resiliency is not about

    making Money. It’s about not losing Money. Uwe Friedrichsen
  2. Confidential │ ©2020 VMware, Inc. 12 Availability = MTTF MTTF

    + MTTR What we optimize for (“known unknowns”)
  3. Confidential │ ©2020 VMware, Inc. 13 Distributed systems force developers

    to make a quantum leap from the relative certainty of a single machine or process to the byzantine interactions of interconnected subsystems, where we cannot stop the world to take a snapshot of it or to make it move one step at a time. -- Sergey Bykov We need a Change in our Mindset
  4. Confidential │ ©2020 VMware, Inc. 15 Availability = MTTF MTTF

    + MTTR What we must optimize for (“unknown unknowns”)
  5. Confidential │ ©2020 VMware, Inc. 16 © Uwe Friedrichsen “Just

    make your system resilient!” they said…
  6. Confidential │ ©2020 VMware, Inc. 18 A complex system that

    works is invariably found to have evolved from a simple system that worked. John Gall Well, … J “I want a simple Solution!”
  7. Confidential │ ©2020 VMware, Inc. 21 Then and Now Resiliency

    Engineering 20th century: Why do accidents happen 21st century: Why they don‘t https://www.youtube.com/watch?v=PGLYEDpNu60
  8. Confidential │ ©2020 VMware, Inc. 24 cat file | grep

    string | sort – r | uniq The Unix Philosophy
  9. Confidential │ ©2020 VMware, Inc. 25 cat file | grep

    string | sort – r | uniq The Unix Philosophy STREAM <T> ! Did you notice that I said nothing about Microservices?
  10. Confidential │ ©2020 VMware, Inc. 28 Tenets of Resilient Systems

    Domain Understanding Ownership Simplicity Empathy Anti-Fragility Observability Operational Friendliness Preserve Knowledge
  11. Confidential │ ©2020 VMware, Inc. 29 Tenets of Resilient Systems

    If you don’t understand your Business Domain, you don’t understand the Problem Domain Driven Design (Bounded Contexts) Set Expectations (Contracts) This is NOT about Technology Communicate (document) clearly Domain Understanding Ownership Simplicity Empathy Anti-Fragility Observability Operational Friendliness Preserve Knowledge
  12. Confidential │ ©2020 VMware, Inc. 30 Tenets of Resilient Systems

    “You build it, you run it” Cross-functional Teams (“DevSecOps”) Service Orientation API-First Design (no Backdoors) Risk and Incident Management Communication Plans Domain Understanding Ownership Simplicity Empathy Anti-Fragility Observability Operational Friendliness Preserve Knowledge
  13. Confidential │ ©2020 VMware, Inc. 31 Tenets of Resilient Systems

    Simple != easy Self-sufficient (Autonomy) Modular Composable Evolvable Use proven Components Small Gains don’t justify added Complexity (intrinsic vs accidental Complexity) Domain Understanding Ownership Simplicity Empathy Anti-Fragility Observability Operational Friendliness Preserve Knowledge
  14. Confidential │ ©2020 VMware, Inc. 32 Tenets of Resilient Systems

    Your System does not run in Isolation Be a good Citizen Anticipate Effects on (possibly unknown) upstream/downstream Services Mechanical Sympathy Domain Understanding Ownership Simplicity Empathy Anti-Fragility Observability Operational Friendliness Preserve Knowledge
  15. Confidential │ ©2020 VMware, Inc. 33 Tenets of Resilient Systems

    Embrace Failure (Stress) Decentralization and asynchronous Communication (Facts) through weak Links Static Stability (graceful Degradation, Independence) and Zero Trust Isolation (minimize Blast Radius) Admission (Resource) and Flow Control Redundancy (no SPOFs) with Failure Recovery Supervision (“crash-only” Software) Idempotency and Immutability Domain Understanding Ownership Simplicity Empathy Anti-Fragility Observability Operational Friendliness Preserve Knowledge
  16. Confidential │ ©2020 VMware, Inc. 34 Tenets of Resilient Systems

    Instrument and audit everything Measure from the Inside and Outside (Customer) Be transparent Simplify Troubleshooting and Root Cause Analysis (Correlation) Augment with Business Level Metrics (View) Continuously measure / analyze / optimize Drives Capacity Planning Domain Understanding Ownership Simplicity Empathy Anti-Fragility Observability Operational Friendliness Preserve Knowledge
  17. Confidential │ ©2020 VMware, Inc. 35 Tenets of Resilient Systems

    Don’t hide Complexity, never assume Single-Version Software Standardize everything Automate and reduce Toil to prevent Human Error Chaos test in Production Boring (Tech) is good “Soft-Delete” and Self-Healing Document clearly (Recovery Plans, Last Change, Ownership) Practice (Game/Hack Days) Domain Understanding Ownership Simplicity Empathy Anti-Fragility Observability Operational Friendliness Preserve Knowledge
  18. Confidential │ ©2020 VMware, Inc. 36 Tenets of Resilient Systems

    Stand on the Shoulders of Giants Learn from the Past Blameless Culture Write Post-Mortems Make Knowledge accessible Domain Understanding Ownership Simplicity Empathy Anti-Fragility Observability Operational Friendliness Preserve Knowledge
  19. Confidential │ ©2020 VMware, Inc. 37 Tenets of Resilient Systems

    Closed Loops Continuously Observe, Analyze, Act Behavior Driven Development to bridge Silos Ship (release) often but incrementally Version everything (GitOps) Shared Responsibility (GitOps) No central Coordinator APIs as System and Communication Boundaries SLOs as Contracts Domain Understanding Ownership Simplicity Empathy Anti-Fragility Observability Operational Friendliness Preserve Knowledge