Slide 1

Slide 1 text

Resilience Engineering It Might Not Mean What You Think It Means John Allspaw MSc., Human Factors and Systems Safety Adaptive Capacity Labs SNAFU Catchers

Slide 2

Slide 2 text

About me

Slide 3

Slide 3 text

About me

Slide 4

Slide 4 text

What You Are In For 1. Resilience Engineering: a field and a community 2. Recalibration: the “resilience” label 3. Strong assertions on how to think about resilience 4. How RE might approach the topic of fault injection 5. A request

Slide 5

Slide 5 text

Resilience Engineering • A field of study that emerged largely from Cognitive Systems Engineering, early 2000s. • David Woods, Erik Hollnagel, Nancy Leveson, Richard Cook, Sidney Dekker, Jean Paris, Bob Wears, more… • 7 symposia over 12 years

Slide 6

Slide 6 text

Resilience Engineering Community is largely made up of practitioners and researchers from…. working in these domains… Aviation/ATM Rail Maritime Space Surgery Power Plants Intelligence Agencies Law Enforcement Mining Construction Explosives Firefighting Anesthesia Pediatrics Power Grid & Distribution Military Agencies Software Engineering Human Factors & Ergonomics Cognitive Systems Engineering Cybernetics Complexity Science Engineering* Psychology Sociology Ecology Safety Science

Slide 7

Slide 7 text

Some of the cast of characters David Woods CSEL/OSU Shawna Perry Univ of Florida Emergency Medicine Dr. Richard Cook Anesthesiologist Researcher Ivonne Andrade Herrera SINTEF Erik Hollnagel Univ of S. Denmark Anne-Sophie Nyssen University de Liege Johan Bergström Lund University Sidney Dekker Griffith University Asher Balkin CSEL/OSU Laura Maguire CSEL/OSU

Slide 8

Slide 8 text

Sample of Research Experiences in Fukushima Dai-ichi nuclear power plant in light of resilience engineering Unmanned Aircraft Systems in (Inter)national Airspace: Resilience as a Lever in the Debate Sociotechnical Networks for Power Grid Resilience: South Korean Case Study Limits on adaptation: Modeling Resilience and Brittleness in Hospital Emergency Departments

Slide 9

Slide 9 text

Books

Slide 10

Slide 10 text

Resilience is something that a system does, not what a system has.

Slide 11

Slide 11 text

Resilience is the story of the outage that didn’t happen.

Slide 12

Slide 12 text

A Mental Model

Slide 13

Slide 13 text

externally sourced code (e.g. DB) results delivery technology stack internally sourced code results the outside world

Slide 14

Slide 14 text

code generating tools testing tools deployment tools organization/ encapsulation tools “monitoring” tools code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code externally sourced code (e.g. DB) results delivery technology stack internally sourced code results the outside world

Slide 15

Slide 15 text

code generating tools testing tools deployment tools organization/ encapsulation tools “monitoring” tools pseudo/ meta/ rules code getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing

Slide 16

Slide 16 text

code generating tools testing tools deployment tools organization/ encapsulation tools “monitoring” tools getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code externally sourced code (e.g. DB) results delivery technology stack internally sourced code results “below the line” “above the line”

Slide 17

Slide 17 text

code generating tools testing tools deployment tools organization/ encapsulation tools “monitoring” tools getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code externally sourced code (e.g. DB) results delivery technology stack internally sourced code results The Thing You’re Building The Stuff You Build and Maintain With The People Doing The Work

Slide 18

Slide 18 text

code generating tools testing tools deployment tools organization/ encapsulation tools “monitoring” tools getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing T coordinating testing anticipating learning modeling troubleshooting organizing remembering revising planning monitoring

Slide 19

Slide 19 text

If you haven’t found people responsible for outcomes, you haven’t seen the system.

Slide 20

Slide 20 text

externally sourced code (e.g. DB) results delivery technology stack internally sourced code results Representations } Interactions Communications Signaling Why is it doing that? What needs to change? What does it mean? How should this work? What’s it doing? What does it mean? What is happening? What should happen? What does it mean? Cognition Goals Purposes Risks What matters Why what matters matters getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing code generating tools testing tools deployment tools organization/ encapsulation tools “monitoring” tools code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code } Artifacts

Slide 21

Slide 21 text

externally sourced code (e.g. DB) results delivery technology stack internally sourced code results Representations } Interactions Communications Signaling Why is it doing that? What needs to change? What does it mean? How should this work? What’s it doing? What does it mean? What is happening? What should happen? What does it mean? Cognition Goals Purposes Risks What matters Why what matters matters getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing code generating tools testing tools deployme nt tools organization/ encapsulatio n tools “monitoring” tools code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code } Artifacts Time A Resilience Engineering Unit of Analysis

Slide 22

Slide 22 text

Humans are predominantly seen as a liability or hazard. They are a problem to be fixed. Traditional view on the role of people (“Safety-I”) Humans are seen as a resource necessary for system flexibility and resilience. They provide flexible solutions to many potential problems. RE view on the role of people in complex systems (“Safety-II”)

Slide 23

Slide 23 text

“above the line” …is not “management” …is not “organization design” or reporting structures …is how people work (detect/diagnose/solve problems, both acute and chronic) alongside and with technology and each other, under continual trade-off scenarios, that provide the…

Slide 24

Slide 24 text

potential to… • respond • monitor • learn • anticipate the AUDACITY to build and sustain the

Slide 25

Slide 25 text

Why “audacity”?

Slide 26

Slide 26 text

fault injection as a focus

Slide 27

Slide 27 text

Questions from an RE perspective • What resources (funding, incentives, etc.) encourage engineering groups to invest time and effort into designing new fault injection cases? • What criteria directs our focus of attention for fault injection scenarios? • How do teams assess the level of effort needed to maintain the stuff that makes fault injection work - and work safely? • How do teams assess the ongoing value of specific fault injections versus others?

Slide 28

Slide 28 text

potential to… • respond • monitor • learn • anticipate the AUDACITY to build and sustain the

Slide 29

Slide 29 text

code generating tools testing tools deployment tools organization/ encapsulation tools “monitoring” tools getting stuff ready to be part of the running system adding stuff to the running system architectural and structural framing keeping track of what “the system” is doing code repositories code stuff testing/validation suites scripts, rules, etc. test cases neo-assemblers pseudo/ meta/ rules code externally sourced code (e.g. DB) results delivery technology stack internally sourced code results The Thing You’re Building The Stuff You Build and Maintain With The People Doing The Work

Slide 30

Slide 30 text

Resilience sustained adaptive capacity

Slide 31

Slide 31 text

Copyright © 2016 by R.I.Cook Copyright ⓒ 2016 by Richard Cook for SNAFUmasters www.snafucatchers.org http://bit.ly/ResilienceConsortium

Slide 32

Slide 32 text

RE Needs Cases From Our World • This is not a field isolated in academia! Progress in RE depends on exploring a wide and diverse set of cases. • Incidents (especially minor ones) make for good case to explore adaptive capacity.

Slide 33

Slide 33 text

Adaptive Capacity Labs Vehicles For Doing This Research

Slide 34

Slide 34 text

http://bit.ly/REShortCourse Short Course In Resilience Engineering David Woods

Slide 35

Slide 35 text

The End