Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Resilience Engineering: It Might Not Mean What You Think It Means

Resilience Engineering: It Might Not Mean What You Think It Means

This is a talk I gave at the Chaos Community Day event in 2017.
The goal of the talk was to give the audience a high-level introduction to Resilience Engineering, both the field of study and the community of researchers), and attempt to describe a couple of core perspectives from Resilience Engineering.

John Allspaw

January 25, 2018
Tweet

More Decks by John Allspaw

Other Decks in Technology

Transcript

  1. Resilience Engineering It Might Not Mean What You Think It

    Means John Allspaw MSc., Human Factors and Systems Safety Adaptive Capacity Labs SNAFU Catchers
  2. What You Are In For 1. Resilience Engineering: a field

    and a community 2. Recalibration: the “resilience” label 3. Strong assertions on how to think about resilience 4. How RE might approach the topic of fault injection 5. A request
  3. Resilience Engineering • A field of study that emerged largely

    from Cognitive Systems Engineering, early 2000s. • David Woods, Erik Hollnagel, Nancy Leveson, Richard Cook, Sidney Dekker, Jean Paris, Bob Wears, more… • 7 symposia over 12 years
  4. Resilience Engineering Community is largely made up of practitioners and

    researchers from…. working in these domains… Aviation/ATM Rail Maritime Space Surgery Power Plants Intelligence Agencies Law Enforcement Mining Construction Explosives Firefighting Anesthesia Pediatrics Power Grid & Distribution Military Agencies Software Engineering Human Factors & Ergonomics Cognitive Systems Engineering Cybernetics Complexity Science Engineering* Psychology Sociology Ecology Safety Science
  5. Some of the cast of characters David Woods CSEL/OSU Shawna

    Perry Univ of Florida Emergency Medicine Dr. Richard Cook Anesthesiologist Researcher Ivonne Andrade Herrera SINTEF Erik Hollnagel Univ of S. Denmark Anne-Sophie Nyssen University de Liege Johan Bergström Lund University Sidney Dekker Griffith University Asher Balkin CSEL/OSU Laura Maguire CSEL/OSU
  6. Sample of Research Experiences in Fukushima Dai-ichi nuclear power plant

    in light of resilience engineering Unmanned Aircraft Systems in (Inter)national Airspace: Resilience as a Lever in the Debate Sociotechnical Networks for Power Grid Resilience: South Korean Case Study Limits on adaptation: Modeling Resilience and Brittleness in Hospital Emergency Departments
  7. code generating tools testing tools deployment tools organization/ encapsulation tools

    “monitoring” tools code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code externally sourced code (e.g. DB) results delivery technology stack internally sourced code results the outside world
  8. code generating tools testing tools deployment tools organization/ encapsulation tools

    “monitoring” tools pseudo/ meta/ rules code getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing
  9. code generating tools testing tools deployment tools organization/ encapsulation tools

    “monitoring” tools getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code externally sourced code (e.g. DB) results delivery technology stack internally sourced code results “below the line” “above the line”
  10. code generating tools testing tools deployment tools organization/ encapsulation tools

    “monitoring” tools getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code externally sourced code (e.g. DB) results delivery technology stack internally sourced code results The Thing You’re Building The Stuff You Build and Maintain With The People Doing The Work
  11. code generating tools testing tools deployment tools organization/ encapsulation tools

    “monitoring” tools getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing T coordinating testing anticipating learning modeling troubleshooting organizing remembering revising planning monitoring
  12. externally sourced code (e.g. DB) results delivery technology stack internally

    sourced code results Representations } Interactions Communications Signaling Why is it doing that? What needs to change? What does it mean? How should this work? What’s it doing? What does it mean? What is happening? What should happen? What does it mean? Cognition Goals Purposes Risks What matters Why what matters matters getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing code generating tools testing tools deployment tools organization/ encapsulation tools “monitoring” tools code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code } Artifacts
  13. externally sourced code (e.g. DB) results delivery technology stack internally

    sourced code results Representations } Interactions Communications Signaling Why is it doing that? What needs to change? What does it mean? How should this work? What’s it doing? What does it mean? What is happening? What should happen? What does it mean? Cognition Goals Purposes Risks What matters Why what matters matters getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing code generating tools testing tools deployme nt tools organization/ encapsulatio n tools “monitoring” tools code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code } Artifacts Time A Resilience Engineering Unit of Analysis
  14. Humans are predominantly seen as a liability or hazard. They

    are a problem to be fixed. Traditional view on the role of people (“Safety-I”) Humans are seen as a resource necessary for system flexibility and resilience. They provide flexible solutions to many potential problems. RE view on the role of people in complex systems (“Safety-II”)
  15. “above the line” …is not “management” …is not “organization design”

    or reporting structures …is how people work (detect/diagnose/solve problems, both acute and chronic) alongside and with technology and each other, under continual trade-off scenarios, that provide the…
  16. Questions from an RE perspective • What resources (funding, incentives,

    etc.) encourage engineering groups to invest time and effort into designing new fault injection cases? • What criteria directs our focus of attention for fault injection scenarios? • How do teams assess the level of effort needed to maintain the stuff that makes fault injection work - and work safely? • How do teams assess the ongoing value of specific fault injections versus others?
  17. code generating tools testing tools deployment tools organization/ encapsulation tools

    “monitoring” tools getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code externally sourced code (e.g. DB) results delivery technology stack internally sourced code results The Thing You’re Building The Stuff You Build and Maintain With The People Doing The Work
  18. Copyright © 2016 by R.I.Cook Copyright ⓒ 2016 by Richard

    Cook for SNAFUmasters www.snafucatchers.org http://bit.ly/ResilienceConsortium
  19. RE Needs Cases From Our World • This is not

    a field isolated in academia! Progress in RE depends on exploring a wide and diverse set of cases. • Incidents (especially minor ones) make for good case to explore adaptive capacity.