“Problem Detection” (Klein, et. al., 2005) - Papers We Love (NYC)

Problem Detection Gary Klein, Rebecca Pliske, Beth Crandall, David Woods
in “Cognition, Technology, and Work” March 2005, Volume 7, Issue 1, pp 14-28 John Allspaw Adaptive Capacity Labs

(the journal) Cognition, Technology, and Work “…focuses on the practical
issues of human interaction with technology within the context of work and, in particular, how human cognition affects, and is affected by, work and working conditions.”

“…process by which people ﬁrst become concerned that events may
be taking an unexpected and undesirable direction that potentially requires action”

problem detection • critical in complex, real-world situations • in
order to improve people’s ability to detect problems, we ﬁrst have to understand how problem detection works in real-world situations.

Once detection happens, people can then... • seek more information
• track events more carefully • try to diagnose or identify the problem • raise the concern to other people • “explain away” the anomaly • cope with it by ﬁnding action(s) that might counter the trajectory of events • accept that the situation has changed in fundamental ways and need to revise goals and plans

“At 12:40 pm ET, an engineer noticed anomalies in our
grafana dashboards.” “On Monday, just after 8:30am, we noticed that a couple of large websites that are hosted on Amazon were having problems and displaying errors.”

reaction to existing work Cowan’s discrepancy accumulation model (1986) ‘as
the accumulation of discrepancies until a threshold was reached’ Klein and Co. say this is not the case, because: a. cues to problems may be subtle and context-dependent b. what counts as a discrepancy depends on the problem-solver’s experience and the stance taken in interpreting the situation. In many cases, detecting a problem is equivalent to reconceptualizing the situation.

problem detection initial factors that arouse concern problem identiﬁcation the
ability to specify the problem this is the focus of the paper

existing cases • 19 NICU nurses • 37 weather forecasters
• US Navy Commanders • Weapons directors aboard AWACS • 26 ﬁreground commanders • Space shuttle mission control • anesthetic management during surgery • aviation ﬂight decks Review of >1000 previous critical incidents, data from Critical Decision Method and other cognitive task analysis techniques

new cases Wildland ﬁreﬁghting (5) Minimally invasive surgery (3)

Case 1 and 2 #1 - NICU nurse case #2
- AEGIS naval battle group exercise not all incidents wield the same potential some cases have elements and qualities that others don’t

disturbances that trigger problem detection

Cues are not primitive events—they are constructions generated by people
trying to understand situations. “…cues are only ‘‘objective’’ in a limited sense” “…rather, the knowledge and expectancies a person has will determine what counts as a cue and whether it will be noticed.”

faults events that threaten to block an intended outcome symptoms
or cues …we notice the disturbances they produce whether we notice these or not depend on several factors, including data from … “sensors” we DO NOT directly perceive them…

routine deteriorating a shift in situations potential increased wind velocity
during ﬁreﬁghting loss of ability to increase compute or storage capacity faults single or multiple

speed of change • in <1s a driving hazard can
appear • mining operations or dams may develop over years • “going sour” pattern (Cook, 1991) - situation slowly deteriorating but goes unnoticed because each symptom considered in isolation does not signify a problem exists number and variety single dominant symptom to a set of multiple symptoms trajectory difference between “safe” and “unsafe” trajectories are clear only in hindsight bifurcation point an unstable, temporary state that can evolve into several stable states absence of data expertise is needed to notice these “negative” events - what is not present symptoms or cues

Case 3 Inbound Exocet missile symptoms have to viewed against
a background

“sensors” completeness • number of them may not be adequate
• placement of them may not be adequate sensitivity • temp probes that can’t go > 150° can’t tell you it’s climbed to 500° • if teammate or system detects early signs of danger but doesn’t announce them update rates • slow update rate can make it hard to gauge trajectories • wasn’t issue with surgeons but was with ﬁreﬁghters direct (such as visual inspection) indirect (such as sw displays) costs • effort and risk - not all data can be collected safely • ease of adjustment credibility • perceived reliability • uncertainty of sensitivity • history of the data

“sensors” turbulence of the background • Operational settings are typically
data rich and noisy. • Many data elements are present that could be relevant to the problem solver • There are a large number of data channels and the signals on these channels usually are changing. • The raw values are rarely constant even when the system is stable and normal. • The default case is detecting emerging signs of trouble against a dynamic background of signals rather than detecting a change from a quiescent, stable, or static background. • The noisiness of the background makes it easy to miss symptoms or to explain them away as part of a different pattern

Problem detection as sensemaking activity data are used to construct
a frame that accounts for the data and guides the search for additional data (a story or script or schema) the frame a person is using to understand events will determine what counts as data Both activities occur in parallel: the data generating the frame, and the frame deﬁning what counts as data

critical factors that determine whether cues will be noticed •
expertise • stance • attention management

expertise • ability to perceive subtle complexities of signs •
ability to generate expectancies • mental models

Case 4

stance can range from: • denial that anything could go
wrong, • to a positive ‘can-do’’ attitude that is conﬁdent of being able to overcome difﬁculties, • to an alert attitude that expects some serious problems might arise, • to a level of hysteria that over-reacts to minor signs and transient signals the orientation the person has to the situation

attention management handling the conﬁguration of “sensors”

Future directions (as of 2005) • More intensive empirical studies
• Effects of variations in stance, expertise, and attention management • Domain-speciﬁc failures • Nonlinear resistance • Human-automation teamwork • Coping with massive amounts of data

Conclusions • Problem detection hasn’t been researched closely • Cowan
(1986) got some things right but he’s mostly wrong • Detecting problems in real-world situations is not trivial • What counts as important cues is context-dependent and heavily dependent on expertise

Implications for us • major, for tool makers • ﬁnd
cases that have elements that support probing for problem detection expertise - and get it out of people’s heads

but wait…what about problem detection in teams? HOMEWORK!

stella.report

Questions?

“Problem Detection” (Klein, et. al., 2005) - Pa...

“Problem Detection” (Klein, et. al., 2005) - Papers We Love (NYC)

John Allspaw

More Decks by John Allspaw

Other Decks in Science

Featured

Transcript

Problem Detection Gary Klein, Rebecca Pliske, Beth Crandall, David Woods

(the journal) Cognition, Technology, and Work “…focuses on the practical

“…process by which people ﬁrst become concerned that events may

problem detection • critical in complex, real-world situations • in

Once detection happens, people can then... • seek more information

“At 12:40 pm ET, an engineer noticed anomalies in our

reaction to existing work Cowan’s discrepancy accumulation model (1986) ‘as

problem detection initial factors that arouse concern problem identiﬁcation the

existing cases • 19 NICU nurses • 37 weather forecasters

new cases Wildland ﬁreﬁghting (5) Minimally invasive surgery (3)

Case 1 and 2 #1 - NICU nurse case #2

disturbances that trigger problem detection

Cues are not primitive events—they are constructions generated by people

faults events that threaten to block an intended outcome symptoms

routine deteriorating a shift in situations potential increased wind velocity

speed of change • in <1s a driving hazard can

Case 3 Inbound Exocet missile symptoms have to viewed against

“sensors” completeness • number of them may not be adequate

“sensors” turbulence of the background • Operational settings are typically

Problem detection as sensemaking activity data are used to construct

critical factors that determine whether cues will be noticed •

expertise • ability to perceive subtle complexities of signs •

Case 4

stance can range from: • denial that anything could go

attention management handling the conﬁguration of “sensors”

Future directions (as of 2005) • More intensive empirical studies

Conclusions • Problem detection hasn’t been researched closely • Cowan

Implications for us • major, for tool makers • ﬁnd

but wait…what about problem detection in teams? HOMEWORK!

stella.report

Questions?