$30 off During Our Annual Pro Sale. View Details »

Tenets of Resilient Systems

Tenets of Resilient Systems

Excerpt from workshop on resilient system design.

Michael Gasch

July 08, 2022
Tweet

More Decks by Michael Gasch

Other Decks in Technology

Transcript

  1. Confidential │ ©2020 VMware, Inc. 9
    Or when Things go REALLY bad …

    View Slide

  2. Confidential │ ©2020 VMware, Inc. 10
    Root Causes of Outages

    View Slide

  3. Confidential │ ©2020 VMware, Inc. 11
    Resiliency is not about making
    Money.
    It’s about not losing Money.
    Uwe Friedrichsen

    View Slide

  4. Confidential │ ©2020 VMware, Inc. 12
    Availability =
    MTTF
    MTTF + MTTR
    What we optimize for
    (“known unknowns”)

    View Slide

  5. Confidential │ ©2020 VMware, Inc. 13
    Distributed systems force developers to
    make a quantum leap from the relative
    certainty of a single machine or process to
    the byzantine interactions of interconnected
    subsystems, where we cannot stop the
    world to take a snapshot of it or to make it
    move one step at a time.
    -- Sergey Bykov
    We need a Change in our Mindset

    View Slide

  6. Confidential │ ©2020 VMware, Inc. 14
    Everything fails,
    all the time.
    Werner Vogels

    View Slide

  7. Confidential │ ©2020 VMware, Inc. 15
    Availability =
    MTTF
    MTTF + MTTR
    What we must optimize for
    (“unknown unknowns”)

    View Slide

  8. Confidential │ ©2020 VMware, Inc. 16
    © Uwe Friedrichsen
    “Just make your system resilient!” they said…

    View Slide

  9. Confidential │ ©2020 VMware, Inc. 17
    !

    View Slide

  10. Confidential │ ©2020 VMware, Inc. 18
    A complex system that works is
    invariably found to have evolved
    from a simple system that worked.
    John Gall
    Well, … J
    “I want a simple Solution!”

    View Slide

  11. Confidential │ ©2020 VMware, Inc. 19
    "

    View Slide

  12. Confidential │ ©2020 VMware, Inc. 20
    Rasmussen’s System Model
    http://www.spacesafetymagazine.com/wp-content/uploads/2016/07/2.-Cook-and-Rasmussen%E2%80%99s-dynamic-safety-model.png

    View Slide

  13. Confidential │ ©2020 VMware, Inc. 21
    Then and Now
    Resiliency Engineering
    20th century: Why do accidents happen
    21st century: Why they don‘t
    https://www.youtube.com/watch?v=PGLYEDpNu60

    View Slide

  14. Confidential │ ©2020 VMware, Inc. 22
    Design Philosophy in Networked Systems
    The Internet

    View Slide

  15. Confidential │ ©2020 VMware, Inc. 23
    Modularity based on
    abstraction is the way
    things are done
    Barbara Liskov

    View Slide

  16. Confidential │ ©2020 VMware, Inc. 24
    cat file | grep string | sort – r | uniq
    The Unix Philosophy

    View Slide

  17. Confidential │ ©2020 VMware, Inc. 25
    cat file | grep string | sort – r | uniq
    The Unix Philosophy
    STREAM
    ! Did you notice that I said nothing about Microservices?

    View Slide

  18. Confidential │ ©2020 VMware, Inc. 26
    Can we apply this Philosophy
    to Distributed Systems?

    View Slide

  19. 27
    Confidential │ ©2020 VMware, Inc.
    Tenets of Resilient Systems

    View Slide

  20. Confidential │ ©2020 VMware, Inc. 28
    Tenets of Resilient Systems
    Domain
    Understanding
    Ownership
    Simplicity
    Empathy
    Anti-Fragility
    Observability
    Operational
    Friendliness
    Preserve
    Knowledge

    View Slide

  21. Confidential │ ©2020 VMware, Inc. 29
    Tenets of Resilient Systems
    If you don’t understand your Business Domain,
    you don’t understand the Problem
    Domain Driven Design (Bounded Contexts)
    Set Expectations (Contracts)
    This is NOT about Technology
    Communicate (document) clearly
    Domain
    Understanding
    Ownership
    Simplicity
    Empathy
    Anti-Fragility
    Observability
    Operational
    Friendliness
    Preserve
    Knowledge

    View Slide

  22. Confidential │ ©2020 VMware, Inc. 30
    Tenets of Resilient Systems
    “You build it, you run it”
    Cross-functional Teams (“DevSecOps”)
    Service Orientation
    API-First Design (no Backdoors)
    Risk and Incident Management
    Communication Plans
    Domain
    Understanding
    Ownership
    Simplicity
    Empathy
    Anti-Fragility
    Observability
    Operational
    Friendliness
    Preserve
    Knowledge

    View Slide

  23. Confidential │ ©2020 VMware, Inc. 31
    Tenets of Resilient Systems
    Simple != easy
    Self-sufficient (Autonomy)
    Modular
    Composable
    Evolvable
    Use proven Components
    Small Gains don’t justify added Complexity
    (intrinsic vs accidental Complexity)
    Domain
    Understanding
    Ownership
    Simplicity
    Empathy
    Anti-Fragility
    Observability
    Operational
    Friendliness
    Preserve
    Knowledge

    View Slide

  24. Confidential │ ©2020 VMware, Inc. 32
    Tenets of Resilient Systems
    Your System does not run in Isolation
    Be a good Citizen
    Anticipate Effects on (possibly unknown)
    upstream/downstream Services
    Mechanical Sympathy
    Domain
    Understanding
    Ownership
    Simplicity
    Empathy
    Anti-Fragility
    Observability
    Operational
    Friendliness
    Preserve
    Knowledge

    View Slide

  25. Confidential │ ©2020 VMware, Inc. 33
    Tenets of Resilient Systems
    Embrace Failure (Stress)
    Decentralization and asynchronous
    Communication (Facts) through weak Links
    Static Stability (graceful Degradation,
    Independence) and Zero Trust
    Isolation (minimize Blast Radius)
    Admission (Resource) and Flow Control
    Redundancy (no SPOFs) with Failure Recovery
    Supervision (“crash-only” Software)
    Idempotency and Immutability
    Domain
    Understanding
    Ownership
    Simplicity
    Empathy
    Anti-Fragility
    Observability
    Operational
    Friendliness
    Preserve
    Knowledge

    View Slide

  26. Confidential │ ©2020 VMware, Inc. 34
    Tenets of Resilient Systems
    Instrument and audit everything
    Measure from the Inside and Outside
    (Customer)
    Be transparent
    Simplify Troubleshooting and Root Cause
    Analysis (Correlation)
    Augment with Business Level Metrics (View)
    Continuously measure / analyze / optimize
    Drives Capacity Planning
    Domain
    Understanding
    Ownership
    Simplicity
    Empathy
    Anti-Fragility
    Observability
    Operational
    Friendliness
    Preserve
    Knowledge

    View Slide

  27. Confidential │ ©2020 VMware, Inc. 35
    Tenets of Resilient Systems
    Don’t hide Complexity, never assume
    Single-Version Software
    Standardize everything
    Automate and reduce Toil to prevent Human
    Error
    Chaos test in Production
    Boring (Tech) is good
    “Soft-Delete” and Self-Healing
    Document clearly (Recovery Plans, Last Change,
    Ownership)
    Practice (Game/Hack Days)
    Domain
    Understanding
    Ownership
    Simplicity
    Empathy
    Anti-Fragility
    Observability
    Operational
    Friendliness
    Preserve
    Knowledge

    View Slide

  28. Confidential │ ©2020 VMware, Inc. 36
    Tenets of Resilient Systems
    Stand on the Shoulders of Giants
    Learn from the Past
    Blameless Culture
    Write Post-Mortems
    Make Knowledge accessible
    Domain
    Understanding
    Ownership
    Simplicity
    Empathy
    Anti-Fragility
    Observability
    Operational
    Friendliness
    Preserve
    Knowledge

    View Slide

  29. Confidential │ ©2020 VMware, Inc. 37
    Tenets of Resilient Systems
    Closed Loops
    Continuously Observe, Analyze, Act
    Behavior Driven Development to bridge Silos
    Ship (release) often but incrementally
    Version everything (GitOps)
    Shared Responsibility (GitOps)
    No central Coordinator
    APIs as System and Communication Boundaries
    SLOs as Contracts
    Domain
    Understanding
    Ownership
    Simplicity
    Empathy
    Anti-Fragility
    Observability
    Operational
    Friendliness
    Preserve
    Knowledge

    View Slide

  30. Confidential │ ©2020 VMware, Inc. 38
    Failure is inevitable.
    But it makes you stronger.

    View Slide