Excerpt from workshop on resilient system design.
Confidential │ ©2020 VMware, Inc. 9Or when Things go REALLY bad …
View Slide
Confidential │ ©2020 VMware, Inc. 10Root Causes of Outages
Confidential │ ©2020 VMware, Inc. 11Resiliency is not about makingMoney.It’s about not losing Money.Uwe Friedrichsen
Confidential │ ©2020 VMware, Inc. 12Availability =MTTFMTTF + MTTRWhat we optimize for(“known unknowns”)
Confidential │ ©2020 VMware, Inc. 13Distributed systems force developers tomake a quantum leap from the relativecertainty of a single machine or process tothe byzantine interactions of interconnectedsubsystems, where we cannot stop theworld to take a snapshot of it or to make itmove one step at a time.-- Sergey BykovWe need a Change in our Mindset
Confidential │ ©2020 VMware, Inc. 14Everything fails,all the time.Werner Vogels
Confidential │ ©2020 VMware, Inc. 15Availability =MTTFMTTF + MTTRWhat we must optimize for(“unknown unknowns”)
Confidential │ ©2020 VMware, Inc. 16© Uwe Friedrichsen“Just make your system resilient!” they said…
Confidential │ ©2020 VMware, Inc. 17!
Confidential │ ©2020 VMware, Inc. 18A complex system that works isinvariably found to have evolvedfrom a simple system that worked.John GallWell, … J“I want a simple Solution!”
Confidential │ ©2020 VMware, Inc. 19"
Confidential │ ©2020 VMware, Inc. 20Rasmussen’s System Modelhttp://www.spacesafetymagazine.com/wp-content/uploads/2016/07/2.-Cook-and-Rasmussen%E2%80%99s-dynamic-safety-model.png
Confidential │ ©2020 VMware, Inc. 21Then and NowResiliency Engineering20th century: Why do accidents happen21st century: Why they don‘thttps://www.youtube.com/watch?v=PGLYEDpNu60
Confidential │ ©2020 VMware, Inc. 22Design Philosophy in Networked SystemsThe Internet
Confidential │ ©2020 VMware, Inc. 23Modularity based onabstraction is the waythings are doneBarbara Liskov
Confidential │ ©2020 VMware, Inc. 24cat file | grep string | sort – r | uniqThe Unix Philosophy
Confidential │ ©2020 VMware, Inc. 25cat file | grep string | sort – r | uniqThe Unix PhilosophySTREAM ! Did you notice that I said nothing about Microservices?
Confidential │ ©2020 VMware, Inc. 26Can we apply this Philosophyto Distributed Systems?
27Confidential │ ©2020 VMware, Inc.Tenets of Resilient Systems
Confidential │ ©2020 VMware, Inc. 28Tenets of Resilient SystemsDomainUnderstandingOwnershipSimplicityEmpathyAnti-FragilityObservabilityOperationalFriendlinessPreserveKnowledge
Confidential │ ©2020 VMware, Inc. 29Tenets of Resilient SystemsIf you don’t understand your Business Domain,you don’t understand the ProblemDomain Driven Design (Bounded Contexts)Set Expectations (Contracts)This is NOT about TechnologyCommunicate (document) clearlyDomainUnderstandingOwnershipSimplicityEmpathyAnti-FragilityObservabilityOperationalFriendlinessPreserveKnowledge
Confidential │ ©2020 VMware, Inc. 30Tenets of Resilient Systems“You build it, you run it”Cross-functional Teams (“DevSecOps”)Service OrientationAPI-First Design (no Backdoors)Risk and Incident ManagementCommunication PlansDomainUnderstandingOwnershipSimplicityEmpathyAnti-FragilityObservabilityOperationalFriendlinessPreserveKnowledge
Confidential │ ©2020 VMware, Inc. 31Tenets of Resilient SystemsSimple != easySelf-sufficient (Autonomy)ModularComposableEvolvableUse proven ComponentsSmall Gains don’t justify added Complexity(intrinsic vs accidental Complexity)DomainUnderstandingOwnershipSimplicityEmpathyAnti-FragilityObservabilityOperationalFriendlinessPreserveKnowledge
Confidential │ ©2020 VMware, Inc. 32Tenets of Resilient SystemsYour System does not run in IsolationBe a good CitizenAnticipate Effects on (possibly unknown)upstream/downstream ServicesMechanical SympathyDomainUnderstandingOwnershipSimplicityEmpathyAnti-FragilityObservabilityOperationalFriendlinessPreserveKnowledge
Confidential │ ©2020 VMware, Inc. 33Tenets of Resilient SystemsEmbrace Failure (Stress)Decentralization and asynchronousCommunication (Facts) through weak LinksStatic Stability (graceful Degradation,Independence) and Zero TrustIsolation (minimize Blast Radius)Admission (Resource) and Flow ControlRedundancy (no SPOFs) with Failure RecoverySupervision (“crash-only” Software)Idempotency and ImmutabilityDomainUnderstandingOwnershipSimplicityEmpathyAnti-FragilityObservabilityOperationalFriendlinessPreserveKnowledge
Confidential │ ©2020 VMware, Inc. 34Tenets of Resilient SystemsInstrument and audit everythingMeasure from the Inside and Outside(Customer)Be transparentSimplify Troubleshooting and Root CauseAnalysis (Correlation)Augment with Business Level Metrics (View)Continuously measure / analyze / optimizeDrives Capacity PlanningDomainUnderstandingOwnershipSimplicityEmpathyAnti-FragilityObservabilityOperationalFriendlinessPreserveKnowledge
Confidential │ ©2020 VMware, Inc. 35Tenets of Resilient SystemsDon’t hide Complexity, never assumeSingle-Version SoftwareStandardize everythingAutomate and reduce Toil to prevent HumanErrorChaos test in ProductionBoring (Tech) is good“Soft-Delete” and Self-HealingDocument clearly (Recovery Plans, Last Change,Ownership)Practice (Game/Hack Days)DomainUnderstandingOwnershipSimplicityEmpathyAnti-FragilityObservabilityOperationalFriendlinessPreserveKnowledge
Confidential │ ©2020 VMware, Inc. 36Tenets of Resilient SystemsStand on the Shoulders of GiantsLearn from the PastBlameless CultureWrite Post-MortemsMake Knowledge accessibleDomainUnderstandingOwnershipSimplicityEmpathyAnti-FragilityObservabilityOperationalFriendlinessPreserveKnowledge
Confidential │ ©2020 VMware, Inc. 37Tenets of Resilient SystemsClosed LoopsContinuously Observe, Analyze, ActBehavior Driven Development to bridge SilosShip (release) often but incrementallyVersion everything (GitOps)Shared Responsibility (GitOps)No central CoordinatorAPIs as System and Communication BoundariesSLOs as ContractsDomainUnderstandingOwnershipSimplicityEmpathyAnti-FragilityObservabilityOperationalFriendlinessPreserveKnowledge
Confidential │ ©2020 VMware, Inc. 38Failure is inevitable.But it makes you stronger.