Slide 1

Slide 1 text

Problem Detection Gary Klein, Rebecca Pliske, Beth Crandall, David Woods in “Cognition, Technology, and Work” March 2005, Volume 7, Issue 1, pp 14-28 John Allspaw Adaptive Capacity Labs

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

(the journal) Cognition, Technology, and Work “…focuses on the practical issues of human interaction with technology within the context of work and, in particular, how human cognition affects, and is affected by, work and working conditions.”

Slide 4

Slide 4 text

“…process by which people first become concerned that events may be taking an unexpected and undesirable direction that potentially requires action”

Slide 5

Slide 5 text

problem detection • critical in complex, real-world situations • in order to improve people’s ability to detect problems, we first have to understand how problem detection works in real-world situations.

Slide 6

Slide 6 text

Once detection happens, people can then... • seek more information • track events more carefully • try to diagnose or identify the problem • raise the concern to other people • “explain away” the anomaly • cope with it by finding action(s) that might counter the trajectory of events • accept that the situation has changed in fundamental ways and need to revise goals and plans

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

“At 12:40 pm ET, an engineer noticed anomalies in our grafana dashboards.” “On Monday, just after 8:30am, we noticed that a couple of large websites that are hosted on Amazon were having problems and displaying errors.”

Slide 9

Slide 9 text

reaction to existing work Cowan’s discrepancy accumulation model (1986) ‘as the accumulation of discrepancies until a threshold was reached’ Klein and Co. say this is not the case, because: a. cues to problems may be subtle and context-dependent b. what counts as a discrepancy depends on the problem-solver’s experience and the stance taken in interpreting the situation. In many cases, detecting a problem is equivalent to reconceptualizing the situation.

Slide 10

Slide 10 text

problem detection initial factors that arouse concern problem identification the ability to specify the problem this is the focus of the paper

Slide 11

Slide 11 text

existing cases • 19 NICU nurses • 37 weather forecasters • US Navy Commanders • Weapons directors aboard AWACS • 26 fireground commanders • Space shuttle mission control • anesthetic management during surgery • aviation flight decks Review of >1000 previous critical incidents, data from Critical Decision Method and other cognitive task analysis techniques

Slide 12

Slide 12 text

new cases Wildland firefighting (5) Minimally invasive surgery (3)

Slide 13

Slide 13 text

Case 1 and 2 #1 - NICU nurse case #2 - AEGIS naval battle group exercise not all incidents wield the same potential some cases have elements and qualities that others don’t

Slide 14

Slide 14 text

disturbances that trigger problem detection

Slide 15

Slide 15 text

Cues are not primitive events—they are constructions generated by people trying to understand situations. “…cues are only ‘‘objective’’ in a limited sense” “…rather, the knowledge and expectancies a person has will determine what counts as a cue and whether it will be noticed.”

Slide 16

Slide 16 text

faults events that threaten to block an intended outcome symptoms or cues …we notice the disturbances they produce whether we notice these or not depend on several factors, including data from … “sensors” we DO NOT directly perceive them…

Slide 17

Slide 17 text

routine deteriorating a shift in situations potential increased wind velocity during firefighting loss of ability to increase compute or storage capacity faults single or multiple

Slide 18

Slide 18 text

speed of change • in <1s a driving hazard can appear • mining operations or dams may develop over years • “going sour” pattern (Cook, 1991) - situation slowly deteriorating but goes unnoticed because each symptom considered in isolation does not signify a problem exists number and variety single dominant symptom to a set of multiple symptoms trajectory difference between “safe” and “unsafe” trajectories are clear only in hindsight bifurcation point an unstable, temporary state that can evolve into several stable states absence of data expertise is needed to notice these “negative” events - what is not present symptoms or cues

Slide 19

Slide 19 text

Case 3 Inbound Exocet missile symptoms have to viewed against a background

Slide 20

Slide 20 text

“sensors” completeness • number of them may not be adequate • placement of them may not be adequate sensitivity • temp probes that can’t go > 150° can’t tell you it’s climbed to 500° • if teammate or system detects early signs of danger but doesn’t announce them update rates • slow update rate can make it hard to gauge trajectories • wasn’t issue with surgeons but was with firefighters direct (such as visual inspection) indirect (such as sw displays) costs • effort and risk - not all data can be collected safely • ease of adjustment credibility • perceived reliability • uncertainty of sensitivity • history of the data

Slide 21

Slide 21 text

“sensors” turbulence of the background • Operational settings are typically data rich and noisy. • Many data elements are present that could be relevant to the problem solver • There are a large number of data channels and the signals on these channels usually are changing. • The raw values are rarely constant even when the system is stable and normal. • The default case is detecting emerging signs of trouble against a dynamic background of signals rather than detecting a change from a quiescent, stable, or static background. • The noisiness of the background makes it easy to miss symptoms or to explain them away as part of a different pattern

Slide 22

Slide 22 text

Problem detection as sensemaking activity data are used to construct a frame that accounts for the data and guides the search for additional data (a story or script or schema) the frame a person is using to understand events will determine what counts as data Both activities occur in parallel: the data generating the frame, and the frame defining what counts as data

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

critical factors that determine whether cues will be noticed • expertise • stance • attention management

Slide 26

Slide 26 text

expertise • ability to perceive subtle complexities of signs • ability to generate expectancies • mental models

Slide 27

Slide 27 text

Case 4

Slide 28

Slide 28 text

stance can range from: • denial that anything could go wrong, • to a positive ‘can-do’’ attitude that is confident of being able to overcome difficulties, • to an alert attitude that expects some serious problems might arise, • to a level of hysteria that over-reacts to minor signs and transient signals the orientation the person has to the situation

Slide 29

Slide 29 text

attention management handling the configuration of “sensors”

Slide 30

Slide 30 text

Future directions (as of 2005) • More intensive empirical studies • Effects of variations in stance, expertise, and attention management • Domain-specific failures • Nonlinear resistance • Human-automation teamwork • Coping with massive amounts of data

Slide 31

Slide 31 text

Conclusions • Problem detection hasn’t been researched closely • Cowan (1986) got some things right but he’s mostly wrong • Detecting problems in real-world situations is not trivial • What counts as important cues is context-dependent and heavily dependent on expertise

Slide 32

Slide 32 text

Implications for us • major, for tool makers • find cases that have elements that support probing for problem detection expertise - and get it out of people’s heads

Slide 33

Slide 33 text

but wait…what about problem detection in teams? HOMEWORK!

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

stella.report

Slide 36

Slide 36 text

Questions?