Slide 1

Slide 1 text

Psychology of alert design

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

G'day! I'm Lindsay Holmwood @auxesis

Slide 4

Slide 4 text

Engineering manager @ Bulletproof

Slide 5

Slide 5 text

cucumber-nagios Visage Flapjack

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

January 16, 2003

Slide 8

Slide 8 text

foam debris broke off the space shuttle's external tank struck left wing http://www.columbiadisaster.info/images/foam_debris_548x627.jpg

Slide 9

Slide 9 text

http://upload.wikimedia.org/wikipedia/commons/9/95/Impact-test.jpg mockup of polyurethane foam hitting wing structure at 850km/h

Slide 10

Slide 10 text

February 3, 2003

Slide 11

Slide 11 text

from nasa tv http://www.youtube.com/watch?v=94J9oVeST0k

Slide 12

Slide 12 text

from free to air television http://www.youtube.com/watch?v=1oBTzbKx0jo

Slide 13

Slide 13 text

Did NASA have "good alerts"?

Slide 14

Slide 14 text

What constitutes a good alert?

Slide 15

Slide 15 text

good alert is a moral judgement

Slide 16

Slide 16 text

No one sets out to create "bad alerts"

Slide 17

Slide 17 text

Alerts designed in context

Slide 18

Slide 18 text

Locally rational

Slide 19

Slide 19 text

“people make what they think are best decisions based on data at hand”

Slide 20

Slide 20 text

We design alerts for humans

Slide 21

Slide 21 text

Let's understand how humans think

Slide 22

Slide 22 text

2 principles

Slide 23

Slide 23 text

Don't startle the operator

Slide 24

Slide 24 text

Don't suggest, expose

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

What is cognitive bias?

Slide 27

Slide 27 text

"Mental shortcut"

Slide 28

Slide 28 text

http://www.flickr.com/photos/frostnova/2268471558/sizes/o

Slide 29

Slide 29 text

Timeliness Accuracy http://www.flickr.com/photos/frostnova/2268471558/sizes/o

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

• Problem solving

Slide 32

Slide 32 text

• Problem solving • Heuristic

Slide 33

Slide 33 text

• Problem solving • Heuristic • Correct result

Slide 34

Slide 34 text

• Problem solving • Heuristic • Correct result • Rational choice

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

• Problem solving

Slide 37

Slide 37 text

• Problem solving • Heuristic

Slide 38

Slide 38 text

• Problem solving • Heuristic • Incorrect result

Slide 39

Slide 39 text

• Problem solving • Heuristic • Incorrect result • Cognitive bias!

Slide 40

Slide 40 text

Heuristic?

Slide 41

Slide 41 text

Pattern matching Heuristics are simple, efficient rules often used by people to form judgements and make decisions. Involve focusing on specific information, and ignoring others. http://www.flickr.com/photos/buttim/1297081125/sizes/o

Slide 42

Slide 42 text

What helped your ancestors survive!

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

March 27, 1977

Slide 45

Slide 45 text

http://www.flickr.com/photos/gsairpics/8318261080/

Slide 46

Slide 46 text

http://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/ Map_Tenerife_Disaster_EN.svg/2000px-Map_Tenerife_Disaster_EN.svg.png

Slide 47

Slide 47 text

http://i1.ytimg.com/vi/LSPkRMbyrGc/maxresdefault.jpg

Slide 48

Slide 48 text

http://awesomestories.com/images/user/9add18ae4d.jpg

Slide 49

Slide 49 text

KLM: 234 passengers 16 crew

Slide 50

Slide 50 text

Pan Am: 326 passengers 9 crew

Slide 51

Slide 51 text

Frozen in place

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

Normalcy bias

Slide 54

Slide 54 text

Before a disaster:

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

• Underestimate:

Slide 57

Slide 57 text

• Underestimate: • risk

Slide 58

Slide 58 text

• Underestimate: • risk • effects

Slide 59

Slide 59 text

• Underestimate: • risk • effects • preparation

Slide 60

Slide 60 text

"Because something bad has never happened, it never will happen"

Slide 61

Slide 61 text

During a disaster:

Slide 62

Slide 62 text

people need an average of 4 prompts before they take action "this truly can't be happening, everything will be ok"

Slide 63

Slide 63 text

• Response: people need an average of 4 prompts before they take action "this truly can't be happening, everything will be ok"

Slide 64

Slide 64 text

• Response: • slow reaction people need an average of 4 prompts before they take action "this truly can't be happening, everything will be ok"

Slide 65

Slide 65 text

• Response: • slow reaction • seek validation people need an average of 4 prompts before they take action "this truly can't be happening, everything will be ok"

Slide 66

Slide 66 text

• Response: • slow reaction • seek validation • optimistic interpretation people need an average of 4 prompts before they take action "this truly can't be happening, everything will be ok"

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

Reaction steps

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

• Cognition

Slide 71

Slide 71 text

• Cognition • Perception

Slide 72

Slide 72 text

• Cognition • Perception • Comprehension

Slide 73

Slide 73 text

• Cognition • Perception • Comprehension • Decision

Slide 74

Slide 74 text

• Cognition • Perception • Comprehension • Decision • Implementation

Slide 75

Slide 75 text

• Cognition • Perception • Comprehension • Decision • Implementation • Movement

Slide 76

Slide 76 text

These are complex tasks

Slide 77

Slide 77 text

You cannot skip these tasks

Slide 78

Slide 78 text

You can practice to make them more automatic

Slide 79

Slide 79 text

People who don't practice deliberate during the disaster

Slide 80

Slide 80 text

http://i1.ytimg.com/vi/LSPkRMbyrGc/maxresdefault.jpg

Slide 81

Slide 81 text

70% freeze 15% freak out 15% react to situation

Slide 82

Slide 82 text

No practice == higher MTTR

Slide 83

Slide 83 text

Don't startle the operator

Slide 84

Slide 84 text

Drill

Slide 85

Slide 85 text

Limit interruptions

Slide 86

Slide 86 text

This is a test

Slide 87

Slide 87 text

No content

Slide 88

Slide 88 text

1.Read the statement once

Slide 89

Slide 89 text

1.Read the statement once 2.Count the letter F

Slide 90

Slide 90 text

No content

Slide 91

Slide 91 text

FINAL FOLIOS SEEM TO RESULT FROM YEARS OF DUTIFUL STUDY OF TEXTS ALONG WITH YEARS OF SCIENTIFIC EXPERIENCE.

Slide 92

Slide 92 text

No content

Slide 93

Slide 93 text

How many did you see?

Slide 94

Slide 94 text

How many did you see? The answer is 8

Slide 95

Slide 95 text

Fluency heuristic http://library.mpib-berlin.mpg.de/ft/rh/RH_Fluency_2008.pdf

Slide 96

Slide 96 text

FINAL FOLIOS SEEM TO RESULT FROM YEARS OF DUTIFUL STUDY OF TEXTS ALONG WITH YEARS OF SCIENTIFIC EXPERIENCE.

Slide 97

Slide 97 text

Brain expects pattern to continue

Slide 98

Slide 98 text

Brain skips other information

Slide 99

Slide 99 text

No content

Slide 100

Slide 100 text

Modeling "failure"

Slide 101

Slide 101 text

No content

Slide 102

Slide 102 text

• a

Slide 103

Slide 103 text

• a • b

Slide 104

Slide 104 text

• a • b • c

Slide 105

Slide 105 text

• a • b • c • d

Slide 106

Slide 106 text

• a • b • c • d • *boom*

Slide 107

Slide 107 text

Let's add barriers

Slide 108

Slide 108 text

No content

Slide 109

Slide 109 text

• a • b • c • d •

Slide 110

Slide 110 text

• a • b • c • d •

Slide 111

Slide 111 text

• a • b • c • d • Soft Hard Soft Hard

Slide 112

Slide 112 text

• a • b • c • d •

Slide 113

Slide 113 text

• a • b • c • d • e *boom*

Slide 114

Slide 114 text

• a • b • c • d • e *boom*

Slide 115

Slide 115 text

• a • b • c • d • e *boom* f

Slide 116

Slide 116 text

• a • b • c • d • e *boom* f

Slide 117

Slide 117 text

• a • b • c • d • e *boom* f g h

Slide 118

Slide 118 text

• a • b • c • d • e *boom* f g h

Slide 119

Slide 119 text

• a • b • c • d • e *boom* f g h i j k

Slide 120

Slide 120 text

• a • b • c • d • e *boom* f g h i j k

Slide 121

Slide 121 text

• a • b • c • d • e *boom* f g h i j k l m n o p q r s t u v w x y z

Slide 122

Slide 122 text

• a • b • c • d • e *boom* f g h i j k l m n o p q r s t u v w x y z

Slide 123

Slide 123 text

• a • b • c • d • e *boom* f g h i j k l m n o p q r s t u v w x y z

Slide 124

Slide 124 text

• a • b • c • d • e *boom* f g h i j k l m n o p q r s t u v w x y z

Slide 125

Slide 125 text

• a • b • c • d • e *boom* f g h i j k l m n o p q r s t u v w x y z

Slide 126

Slide 126 text

• a • b • c • d • e *boom* f g h i j k l m n o p q r s t u v w x y z

Slide 127

Slide 127 text

• a • b • c • d • e *boom* f g h i j k l m n o p q r s t u v w x y z

Slide 128

Slide 128 text

• a • b • c • d • e *boom* f g h i j k l m n o p q r s t u v w x y z Complexity

Slide 129

Slide 129 text

Our systems are not static

Slide 130

Slide 130 text

Our systems are dynamic

Slide 131

Slide 131 text

"Accidents come from relationships, not broken parts"

Slide 132

Slide 132 text

Parenting: does it even make sense?

Slide 133

Slide 133 text

Lots of work

Slide 134

Slide 134 text

Rapidly out of date

Slide 135

Slide 135 text

Emergent behaviour?

Slide 136

Slide 136 text

• a • b • c • d • q n e f h i j k l m o p r s t u v w x y z g

Slide 137

Slide 137 text

• a • b • c • d • n e f h i j k l m o p r s t u v w x y z g *boom*

Slide 138

Slide 138 text

• a • b • c • d • n f h i k m o p r s t u v w x y z g *boom* *boom* *boom* *boom*

Slide 139

Slide 139 text

• a • b • c • d • n f h i k m o p r s t u v w x y z g *boom* *boom* *boom* *boom* this is alerting

Slide 140

Slide 140 text

Don't suggest, expose

Slide 141

Slide 141 text

No content

Slide 142

Slide 142 text

Other industries

Slide 143

Slide 143 text

Aviation

Slide 144

Slide 144 text

AF447

Slide 145

Slide 145 text

No content

Slide 146

Slide 146 text

70 stall warnings

Slide 147

Slide 147 text

http://www.theatlanticwire.com/global/2012/07/final-air-france-447-report-pilots- misunderstood-their-situation/54209/ http://www.dailymail.co.uk/news/article-2020136/Pierre-Cedric-Bonin-David-Robert- blamed-Atlantic-Ocean-Air-France-crash-killed-228.html http://edition.cnn.com/2012/07/05/world/europe/france-air-crash-report/index.html http://www.newscientist.com/blogs/onepercent/2012/07/af447-final-report.html http://gizmodo.com/5923866/air-france-447-crash-a-result-of-crew-ignoring-alarms

Slide 148

Slide 148 text

• Final Air France 447 Report: Pilots misunderstood their situation • Poorly-trained pilots to blame for Air France crash that killed 228 • Final Air France crash report says pilots failed to react swiftly • Air France 447 downed as crew ignored alarms • Air France 447 crash a result of crew ignoring alarms http://www.theatlanticwire.com/global/2012/07/final-air-france-447-report-pilots- misunderstood-their-situation/54209/ http://www.dailymail.co.uk/news/article-2020136/Pierre-Cedric-Bonin-David-Robert- blamed-Atlantic-Ocean-Air-France-crash-killed-228.html http://edition.cnn.com/2012/07/05/world/europe/france-air-crash-report/index.html http://www.newscientist.com/blogs/onepercent/2012/07/af447-final-report.html http://gizmodo.com/5923866/air-france-447-crash-a-result-of-crew-ignoring-alarms

Slide 149

Slide 149 text

“They should have reacted!”

Slide 150

Slide 150 text

Autopilot disconnect audio warning

Slide 151

Slide 151 text

Alternate law reconfiguration audio warning

Slide 152

Slide 152 text

Stall warnings lasted for 54 seconds

Slide 153

Slide 153 text

C-chord altitude horn lasted for 34 seconds

Slide 154

Slide 154 text

Dual control signal indicator light on the controls

Slide 155

Slide 155 text

aural visual Autopilot disconnect x Alternate law reconfiguration x Dual input control x Altitude x Stall warning x

Slide 156

Slide 156 text

Overwhelmed by feedback

Slide 157

Slide 157 text

"In an aural environment that was already saturated by the C-chord warning, the possibility that the crew did not identify the stall warning cannot be ruled out" - BEA report on AF447 http://www.flightglobal.com/news/articles/af447-inquiry-grapples-with-stall-warning- enigma-373857/

Slide 158

Slide 158 text

Operating theatres

Slide 159

Slide 159 text

The Wolf Is Crying in the Operating Room: Patient Monitor and Anesthesia Workstation Alarming Patterns During Cardiac Surgery Schmid F, Goepfert M, et al, Anesthesia & Analgesia, 2010 http://www.anesthesia-analgesia.org/content/112/1/78.long

Slide 160

Slide 160 text

Kappa XLT patient monitor http://www.used-equipment-medical.com/th_sogemed/medias/big/moniteur-drager- kappa-xlt-infinity.jpg

Slide 161

Slide 161 text

Drager Zeus anesthesia workstation http://img.medicalexpo.com/pdf/repository_me/68268/zeus-infinity- empowered-83059_5b.jpg

Slide 162

Slide 162 text

http://www.flickr.com/photos/quinnanya/5646121120/sizes/l/ pulse oximeter was used

Slide 163

Slide 163 text

http://www.flickr.com/photos/digital-noise/3650559857/sizes/o electrocardiogram was used

Slide 164

Slide 164 text

http://en.wikipedia.org/wiki/File:Arterial_kateter.jpg arterial blood pressure monitoring

Slide 165

Slide 165 text

central venous pressure was measured with a central venous catheter http://drugline.org/img/term/venous-catheter-central-15887_1.jpg

Slide 166

Slide 166 text

1 second sampling interval

Slide 167

Slide 167 text

Procedures were video recorded

Slide 168

Slide 168 text

Results?

Slide 169

Slide 169 text

1.2 alerts / minute

Slide 170

Slide 170 text

80% of the 8975 alarms were of no consequence

Slide 171

Slide 171 text

30% of the 8975 alarms were false positives

Slide 172

Slide 172 text

No content

Slide 173

Slide 173 text

How can we improve?

Slide 174

Slide 174 text

Provide more context

Slide 175

Slide 175 text

No content

Slide 176

Slide 176 text

No content

Slide 177

Slide 177 text

No content

Slide 178

Slide 178 text

No content

Slide 179

Slide 179 text

Don't suggest, expose

Slide 180

Slide 180 text

No content

Slide 181

Slide 181 text

Reduce notifications

Slide 182

Slide 182 text

No content

Slide 183

Slide 183 text

No notifications on individual checks

Slide 184

Slide 184 text

Notify on the aggregate

Slide 185

Slide 185 text

check_check

Slide 186

Slide 186 text

$ check_check.rb -s solrserver OK=27 WARNING=0 CRITICAL=1 UNKNOWN=0 services=/solrserver/ hosts=// Services in CRITICAL: frontend1.example.com => solrserver client tests

Slide 187

Slide 187 text

Riemann's event grouping http://riemann.io/howto.html#group-events-in-time

Slide 188

Slide 188 text

Don't startle the operator

Slide 189

Slide 189 text

No content

Slide 190

Slide 190 text

Rollup

Slide 191

Slide 191 text

Limit alerts that are emitted

Slide 192

Slide 192 text

Aggregate alerts together

Slide 193

Slide 193 text

Incident response:

Slide 194

Slide 194 text

Brute force: manual silence

Slide 195

Slide 195 text

limit # of engineers who watch alerts & graphs

Slide 196

Slide 196 text

Alerting system

Slide 197

Slide 197 text

Flapjack

Slide 198

Slide 198 text

Delay-based notification

Slide 199

Slide 199 text

Per-media rollup threshold

Slide 200

Slide 200 text

Don't startle the operator

Slide 201

Slide 201 text

Granular alerting levels

Slide 202

Slide 202 text

Alerta

Slide 203

Slide 203 text

github.com/guardian/alerta/wiki/Alert-Format Alerta alerting levels

Slide 204

Slide 204 text

Nagios alerting levels

Slide 205

Slide 205 text

• a • b • c • d • q n e f h i j k l m o p r s t u v w x y z g @abestanway's talk: https://speakerdeck.com/astanway/mom-my-algorithms-suck

Slide 206

Slide 206 text

• a • b • c • d • q n e f h i j k l m o p r s t u v w x y z g @abestanway's talk: https://speakerdeck.com/astanway/mom-my-algorithms-suck

Slide 207

Slide 207 text

• a • b • c • d • q n e f h i j k l m o p r s t u v w x y z g we alerts now @abestanway's talk: https://speakerdeck.com/astanway/mom-my-algorithms-suck

Slide 208

Slide 208 text

No content

Slide 209

Slide 209 text

It's not all doom and gloom

Slide 210

Slide 210 text

We are on the cutting edge

Slide 211

Slide 211 text

http://www.flickr.com/photos/quinnanya/5646121120/sizes/l/ pulse oximeter was used

Slide 212

Slide 212 text

No content

Slide 213

Slide 213 text

Don't startle the operator

Slide 214

Slide 214 text

Don't suggest, expose

Slide 215

Slide 215 text

We design alerts for humans

Slide 216

Slide 216 text

Let's understand how humans think

Slide 217

Slide 217 text

No content

Slide 218

Slide 218 text

Thank you!

Slide 219

Slide 219 text

Thank you! — the talk? Let @auxesis know!