Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOpsDays NYC 2020: Can Resilience Engineering be sufficiently described in 5 minutes?

DevOpsDays NYC 2020: Can Resilience Engineering be sufficiently described in 5 minutes?

Of course the answer to the question in the title is “no” because this twenty-year old multidisciplinary field is as broad and deep as Distributed Systems. Bringing perspectives, methods, and concepts from Resilience Engineering is a long game; my goal is to whet your appetite and lay down enough compelling threads for you to pull on as this important long game unfolds.

393c4d3cf4315a211e04f2a85abe7822?s=128

John Allspaw

March 03, 2020
Tweet

Transcript

  1. time “Safety-I”

  2. time “non-incident” time continual preventing and catching of incidents “Safety-II”

  3. There is no “one weird trick” to understand Resilience Engineering

    and that’s ok! John Allspaw Adaptive Capacity Labs
  4. Resilience Engineering is a FIELD Cybernetics Ecology Safety Science Biology

    Control Systems Human Factors & Ergonomics Cognitive Systems Engineering Complexity Science Cognitive Psychology Sociology Operations Research
  5. Resilience Engineering is a COMMUNITY Rail Maritime Surgery Intelligence Agencies

    Law Enforcement Aviation/ATM Space Mining Construction Explosives Firefighting Anesthesia Pediatrics Power Grid & Distribution Military Agencies Software Engineering
  6. Resilience Engineering is a COMMUNITY Rail Maritime Surgery Intelligence Agencies

    Law Enforcement Aviation/ATM Space Mining Construction Explosives Firefighting Anesthesia Pediatrics Power Grid & Distribution Military Agencies Software Engineering
  7. resilience is not these things • redundancy • robustness •

    high-availability / fault-tolerance • Chaos Engineering • anything about software or hardware!
  8. people are the source of adaptation activities to sustain adaptation

  9. unforeseen unanticipated unexpected fundamentally surprising resilience plays out in the…

  10. situational surprise Buy a ticket and win

  11. Buy a ticket and win Don’t buy a ticket and

    win fundamental surprise situational surprise
  12. “…proactive activities aimed at preparing to be unprepared” ability to

    handle fundamental surprises
  13. Resilience is already present and happening in your org! it’s

    just difficult for us to “see”
  14. example “in the wild” it’s needed for handling fundamental surprises

    we continually invest in the ability to deploy to production when it’s needed
  15. • automated tests • availability of (and expertise involved) peer

    code review • availability (and familiarity with using) feature/config flags • people available and looking for signs of trouble, focusing attention • the ability to contact others who can help if necessary
  16. “too academic?!” we can do this! it’s already happening

  17. Understanding Resilience Engineering will take time the concepts are not

    intuitive and also critically important
  18. Change Is Afoot 2018 2019 J. Paul Reed 2018 Nora

    Jones Casey Rosenthal 2020 Jessica DeVita Chad Todd Tim Tischler 2021 Learning From Incidents In Software http://learningfromincidents.io
  19. bit.ly/BoneResilience The next talk you see on this topic should

    be
  20. ResiliencePapers.club LearningFromIncidents.io AdaptiveCapacityLabs.com/blog ResilienceRoundup.com Thanks!