Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Psychology of Alert Design

The Psychology of Alert Design

It's 3:37am. Your phone starts buzzing. It doesn't stop. 1000s of alerts. All the things are broken. Where do you even begin?

You freeze.

The infrastructure we're operating are increasingly complex and nuanced. Events at one edge can have unintended and unpredictable effects on the other, and there is no obvious causal relationship. This makes debugging failure hard.

Good alert design is important to lowering the MTTR when our complex infrastructures fail, but what constitutes a "good alert"? Our brains work in unexpected ways, with cognitive biases and priming skewing our perception of reality. It's vitally important to understand how we think and react under pressure when designing alerts and communicating failure.

In this talk, Lindsay will showcase some of the psychological underpinnings you should take into account when designing your alerts, how other industries handle alert design, and what tools are available to increase your operational effectiveness in the face of massive failures today.

Sources used to create this talk:

- http://www.columbiadisaster.info/images/foam_debris_548x627.jpg
- http://upload.wikimedia.org/wikipedia/commons/9/95/Impact-test.jpg
- http://www.youtube.com/watch?v=94J9oVeST0k
- http://www.youtube.com/watch?v=1oBTzbKx0jo
- http://www.flickr.com/photos/frostnova/2268471558
- http://www.flickr.com/photos/buttim/1297081125
- http://www.flickr.com/photos/gsairpics/8318261080
- http://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Map_Tenerife_Disaster_EN.svg/2000px-Map_Tenerife_Disaster_EN.svg.png
- http://i1.ytimg.com/vi/LSPkRMbyrGc/maxresdefault.jpg
- http://awesomestories.com/images/user/9add18ae4d.jpg
- http://library.mpib-berlin.mpg.de/ft/rh/RH_Fluency_2008.pdf
- http://www.theatlanticwire.com/global/2012/07/final-air-france-447-report-pilots-misunderstood-their-situation/54209/
- http://www.dailymail.co.uk/news/article-2020136/Pierre-Cedric-Bonin-David-Robert-blamed-Atlantic-Ocean-Air-France-crash-killed-228.html
- http://edition.cnn.com/2012/07/05/world/europe/france-air-crash-report/index.html
- http://www.newscientist.com/blogs/onepercent/2012/07/af447-final-report.html
- http://gizmodo.com/5923866/air-france-447-crash-a-result-of-crew-ignoring-alarms
- http://www.flightglobal.com/news/articles/af447-inquiry-grapples-with-stall-warning-enigma-373857/
- http://www.anesthesia-analgesia.org/content/112/1/78.long
- http://www.used-equipment-medical.com/th_sogemed/medias/big/moniteur-drager-kappa-xlt-infinity.jpg
- http://img.medicalexpo.com/pdf/repository_me/68268/zeus-infinity-empowered-83059_5b.jpg
- http://www.flickr.com/photos/quinnanya/5646121120
- http://www.flickr.com/photos/digital-noise/3650559857
- http://en.wikipedia.org/wiki/File:Arterial_kateter.jpg
- http://drugline.org/img/term/venous-catheter-central-15887_1.jpg
- http://riemann.io/howto.html#group-events-in-time

Lindsay Holmwood

September 19, 2013
Tweet

More Decks by Lindsay Holmwood

Other Decks in Technology

Transcript

  1. Psychology of
    alert design

    View Slide

  2. View Slide

  3. G'day!
    I'm Lindsay Holmwood
    @auxesis

    View Slide

  4. Engineering manager
    @
    Bulletproof

    View Slide

  5. cucumber-nagios
    Visage
    Flapjack

    View Slide

  6. View Slide

  7. January 16, 2003

    View Slide

  8. foam debris broke off the space shuttle's external tank
    struck left wing
    http://www.columbiadisaster.info/images/foam_debris_548x627.jpg

    View Slide

  9. http://upload.wikimedia.org/wikipedia/commons/9/95/Impact-test.jpg
    mockup of polyurethane foam hitting wing structure at 850km/h

    View Slide

  10. February 3, 2003

    View Slide

  11. from nasa tv
    http://www.youtube.com/watch?v=94J9oVeST0k

    View Slide

  12. from free to air television
    http://www.youtube.com/watch?v=1oBTzbKx0jo

    View Slide

  13. Did NASA have
    "good alerts"?

    View Slide

  14. What constitutes
    a good alert?

    View Slide

  15. good alert is a
    moral judgement

    View Slide

  16. No one sets out
    to create
    "bad alerts"

    View Slide

  17. Alerts designed
    in context

    View Slide

  18. Locally rational

    View Slide

  19. “people make what they
    think are best decisions
    based on data at hand”

    View Slide

  20. We design alerts
    for humans

    View Slide

  21. Let's understand
    how humans think

    View Slide

  22. 2 principles

    View Slide

  23. Don't startle
    the operator

    View Slide

  24. Don't suggest, expose

    View Slide

  25. View Slide

  26. What is
    cognitive bias?

    View Slide

  27. "Mental shortcut"

    View Slide

  28. http://www.flickr.com/photos/frostnova/2268471558/sizes/o

    View Slide

  29. Timeliness
    Accuracy
    http://www.flickr.com/photos/frostnova/2268471558/sizes/o

    View Slide

  30. View Slide


  31. Problem solving

    View Slide


  32. Problem solving

    Heuristic

    View Slide


  33. Problem solving

    Heuristic

    Correct result

    View Slide


  34. Problem solving

    Heuristic

    Correct result

    Rational choice

    View Slide

  35. View Slide


  36. Problem solving

    View Slide


  37. Problem solving

    Heuristic

    View Slide


  38. Problem solving

    Heuristic

    Incorrect result

    View Slide


  39. Problem solving

    Heuristic

    Incorrect result

    Cognitive bias!

    View Slide

  40. Heuristic?

    View Slide

  41. Pattern matching
    Heuristics are simple, efficient rules often used by people to form judgements and make
    decisions.
    Involve focusing on specific information, and ignoring others.
    http://www.flickr.com/photos/buttim/1297081125/sizes/o

    View Slide

  42. What helped
    your ancestors
    survive!

    View Slide

  43. View Slide

  44. March 27, 1977

    View Slide

  45. http://www.flickr.com/photos/gsairpics/8318261080/

    View Slide

  46. http://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/
    Map_Tenerife_Disaster_EN.svg/2000px-Map_Tenerife_Disaster_EN.svg.png

    View Slide

  47. http://i1.ytimg.com/vi/LSPkRMbyrGc/maxresdefault.jpg

    View Slide

  48. http://awesomestories.com/images/user/9add18ae4d.jpg

    View Slide

  49. KLM:
    234 passengers
    16 crew

    View Slide

  50. Pan Am:
    326 passengers
    9 crew

    View Slide

  51. Frozen in place

    View Slide

  52. View Slide

  53. Normalcy bias

    View Slide

  54. Before a disaster:

    View Slide

  55. View Slide


  56. Underestimate:

    View Slide


  57. Underestimate:

    risk

    View Slide


  58. Underestimate:

    risk

    effects

    View Slide


  59. Underestimate:

    risk

    effects

    preparation

    View Slide

  60. "Because something
    bad has never
    happened, it never
    will happen"

    View Slide

  61. During a disaster:

    View Slide

  62. people need an average of 4 prompts before they take action
    "this truly can't be happening, everything will be ok"

    View Slide


  63. Response:
    people need an average of 4 prompts before they take action
    "this truly can't be happening, everything will be ok"

    View Slide


  64. Response:

    slow reaction
    people need an average of 4 prompts before they take action
    "this truly can't be happening, everything will be ok"

    View Slide


  65. Response:

    slow reaction

    seek validation
    people need an average of 4 prompts before they take action
    "this truly can't be happening, everything will be ok"

    View Slide


  66. Response:

    slow reaction

    seek validation

    optimistic interpretation
    people need an average of 4 prompts before they take action
    "this truly can't be happening, everything will be ok"

    View Slide

  67. View Slide

  68. Reaction steps

    View Slide

  69. View Slide


  70. Cognition

    View Slide


  71. Cognition

    Perception

    View Slide


  72. Cognition

    Perception

    Comprehension

    View Slide


  73. Cognition

    Perception

    Comprehension

    Decision

    View Slide


  74. Cognition

    Perception

    Comprehension

    Decision

    Implementation

    View Slide


  75. Cognition

    Perception

    Comprehension

    Decision

    Implementation

    Movement

    View Slide

  76. These are
    complex tasks

    View Slide

  77. You cannot skip
    these tasks

    View Slide

  78. You can practice to
    make them more
    automatic

    View Slide

  79. People who don't
    practice deliberate
    during the disaster

    View Slide

  80. http://i1.ytimg.com/vi/LSPkRMbyrGc/maxresdefault.jpg

    View Slide

  81. 70% freeze
    15% freak out
    15% react to situation

    View Slide

  82. No practice == higher MTTR

    View Slide

  83. Don't startle
    the operator

    View Slide

  84. Drill

    View Slide

  85. Limit interruptions

    View Slide

  86. This is a test

    View Slide

  87. View Slide

  88. 1.Read the statement once

    View Slide

  89. 1.Read the statement once
    2.Count the letter F

    View Slide

  90. View Slide

  91. FINAL FOLIOS SEEM TO RESULT
    FROM YEARS OF DUTIFUL STUDY
    OF TEXTS ALONG WITH YEARS OF
    SCIENTIFIC EXPERIENCE.

    View Slide

  92. View Slide

  93. How many
    did you see?

    View Slide

  94. How many
    did you see?
    The answer is 8

    View Slide

  95. Fluency heuristic
    http://library.mpib-berlin.mpg.de/ft/rh/RH_Fluency_2008.pdf

    View Slide

  96. FINAL FOLIOS SEEM TO RESULT
    FROM YEARS OF DUTIFUL STUDY
    OF TEXTS ALONG WITH YEARS OF
    SCIENTIFIC EXPERIENCE.

    View Slide

  97. Brain expects pattern
    to continue

    View Slide

  98. Brain skips
    other information

    View Slide

  99. View Slide

  100. Modeling "failure"

    View Slide

  101. View Slide


  102. a

    View Slide


  103. a

    b

    View Slide


  104. a

    b

    c

    View Slide


  105. a

    b

    c

    d

    View Slide


  106. a

    b

    c

    d

    *boom*

    View Slide

  107. Let's add barriers

    View Slide

  108. View Slide


  109. a

    b

    c

    d

    View Slide


  110. a

    b

    c

    d

    View Slide


  111. a

    b

    c

    d

    Soft
    Hard
    Soft
    Hard

    View Slide


  112. a

    b

    c

    d

    View Slide


  113. a

    b

    c

    d

    e
    *boom*

    View Slide


  114. a

    b

    c

    d

    e
    *boom*

    View Slide


  115. a

    b

    c

    d

    e
    *boom*
    f

    View Slide


  116. a

    b

    c

    d

    e
    *boom*
    f

    View Slide


  117. a

    b

    c

    d

    e
    *boom*
    f
    g
    h

    View Slide


  118. a

    b

    c

    d

    e
    *boom*
    f
    g
    h

    View Slide


  119. a

    b

    c

    d

    e
    *boom*
    f
    g
    h
    i
    j
    k

    View Slide


  120. a

    b

    c

    d

    e
    *boom*
    f
    g
    h
    i
    j
    k

    View Slide


  121. a

    b

    c

    d

    e
    *boom*
    f
    g
    h
    i
    j
    k
    l
    m
    n
    o
    p
    q
    r
    s
    t
    u
    v
    w
    x
    y
    z

    View Slide


  122. a

    b

    c

    d

    e
    *boom*
    f
    g
    h
    i
    j
    k
    l
    m
    n
    o
    p
    q
    r
    s
    t
    u
    v
    w
    x
    y
    z

    View Slide


  123. a

    b

    c

    d

    e
    *boom*
    f
    g
    h
    i
    j
    k
    l
    m
    n
    o
    p
    q
    r
    s
    t
    u
    v
    w
    x
    y
    z

    View Slide


  124. a

    b

    c

    d

    e
    *boom*
    f
    g
    h
    i
    j
    k
    l
    m
    n
    o
    p
    q
    r
    s
    t
    u
    v
    w
    x
    y
    z

    View Slide


  125. a

    b

    c

    d

    e
    *boom*
    f
    g
    h
    i
    j
    k
    l
    m
    n
    o
    p
    q
    r
    s
    t
    u
    v
    w
    x
    y
    z

    View Slide


  126. a

    b

    c

    d

    e
    *boom*
    f
    g
    h
    i
    j
    k
    l
    m
    n
    o
    p
    q
    r
    s
    t
    u
    v
    w
    x
    y
    z

    View Slide


  127. a

    b

    c

    d

    e
    *boom*
    f
    g
    h
    i
    j
    k
    l
    m
    n
    o
    p
    q
    r
    s
    t
    u
    v
    w
    x
    y
    z

    View Slide


  128. a

    b

    c

    d

    e
    *boom*
    f
    g
    h
    i
    j
    k
    l
    m
    n
    o
    p
    q
    r
    s
    t
    u
    v
    w
    x
    y
    z
    Complexity

    View Slide

  129. Our systems are
    not static

    View Slide

  130. Our systems are
    dynamic

    View Slide

  131. "Accidents come from
    relationships, not
    broken parts"

    View Slide

  132. Parenting: does it
    even make sense?

    View Slide

  133. Lots of work

    View Slide

  134. Rapidly
    out of date

    View Slide

  135. Emergent
    behaviour?

    View Slide


  136. a

    b

    c

    d

    q
    n
    e
    f
    h
    i
    j
    k
    l
    m
    o
    p
    r
    s
    t
    u
    v
    w
    x
    y
    z
    g

    View Slide


  137. a

    b

    c

    d

    n
    e
    f
    h
    i
    j
    k
    l
    m
    o
    p
    r
    s
    t
    u
    v
    w
    x
    y
    z
    g
    *boom*

    View Slide


  138. a

    b

    c

    d

    n
    f
    h
    i
    k
    m
    o
    p
    r
    s
    t
    u
    v
    w
    x
    y
    z
    g
    *boom*
    *boom*
    *boom*
    *boom*

    View Slide


  139. a

    b

    c

    d

    n
    f
    h
    i
    k
    m
    o
    p
    r
    s
    t
    u
    v
    w
    x
    y
    z
    g
    *boom*
    *boom*
    *boom*
    *boom*
    this is alerting

    View Slide

  140. Don't suggest, expose

    View Slide

  141. View Slide

  142. Other industries

    View Slide

  143. Aviation

    View Slide

  144. AF447

    View Slide

  145. View Slide

  146. 70 stall warnings

    View Slide

  147. http://www.theatlanticwire.com/global/2012/07/final-air-france-447-report-pilots-
    misunderstood-their-situation/54209/
    http://www.dailymail.co.uk/news/article-2020136/Pierre-Cedric-Bonin-David-Robert-
    blamed-Atlantic-Ocean-Air-France-crash-killed-228.html
    http://edition.cnn.com/2012/07/05/world/europe/france-air-crash-report/index.html
    http://www.newscientist.com/blogs/onepercent/2012/07/af447-final-report.html
    http://gizmodo.com/5923866/air-france-447-crash-a-result-of-crew-ignoring-alarms

    View Slide


  148. Final Air France 447 Report: Pilots misunderstood their situation

    Poorly-trained pilots to blame for Air France crash that killed 228

    Final Air France crash report says pilots failed to react swiftly

    Air France 447 downed as crew ignored alarms

    Air France 447 crash a result of crew ignoring alarms
    http://www.theatlanticwire.com/global/2012/07/final-air-france-447-report-pilots-
    misunderstood-their-situation/54209/
    http://www.dailymail.co.uk/news/article-2020136/Pierre-Cedric-Bonin-David-Robert-
    blamed-Atlantic-Ocean-Air-France-crash-killed-228.html
    http://edition.cnn.com/2012/07/05/world/europe/france-air-crash-report/index.html
    http://www.newscientist.com/blogs/onepercent/2012/07/af447-final-report.html
    http://gizmodo.com/5923866/air-france-447-crash-a-result-of-crew-ignoring-alarms

    View Slide

  149. “They should
    have reacted!”

    View Slide

  150. Autopilot disconnect
    audio warning

    View Slide

  151. Alternate law
    reconfiguration
    audio warning

    View Slide

  152. Stall warnings
    lasted for 54 seconds

    View Slide

  153. C-chord altitude horn
    lasted for 34 seconds

    View Slide

  154. Dual control signal
    indicator light on the controls

    View Slide

  155. aural visual
    Autopilot disconnect x
    Alternate law reconfiguration x
    Dual input control x
    Altitude x
    Stall warning x

    View Slide

  156. Overwhelmed
    by feedback

    View Slide

  157. "In an aural environment that was already
    saturated by the C-chord warning, the
    possibility that the crew did not identify the
    stall warning cannot be ruled out"
    - BEA report on AF447
    http://www.flightglobal.com/news/articles/af447-inquiry-grapples-with-stall-warning-
    enigma-373857/

    View Slide

  158. Operating theatres

    View Slide

  159. The Wolf Is Crying in the Operating Room:
    Patient Monitor and Anesthesia Workstation
    Alarming Patterns During Cardiac Surgery
    Schmid F, Goepfert M, et al, Anesthesia & Analgesia, 2010
    http://www.anesthesia-analgesia.org/content/112/1/78.long

    View Slide

  160. Kappa XLT patient monitor
    http://www.used-equipment-medical.com/th_sogemed/medias/big/moniteur-drager-
    kappa-xlt-infinity.jpg

    View Slide

  161. Drager Zeus anesthesia workstation
    http://img.medicalexpo.com/pdf/repository_me/68268/zeus-infinity-
    empowered-83059_5b.jpg

    View Slide

  162. http://www.flickr.com/photos/quinnanya/5646121120/sizes/l/
    pulse oximeter was used

    View Slide

  163. http://www.flickr.com/photos/digital-noise/3650559857/sizes/o
    electrocardiogram was used

    View Slide

  164. http://en.wikipedia.org/wiki/File:Arterial_kateter.jpg
    arterial blood pressure monitoring

    View Slide

  165. central venous pressure was measured with a central venous catheter
    http://drugline.org/img/term/venous-catheter-central-15887_1.jpg

    View Slide

  166. 1 second
    sampling interval

    View Slide

  167. Procedures were
    video recorded

    View Slide

  168. Results?

    View Slide

  169. 1.2 alerts / minute

    View Slide

  170. 80% of the 8975
    alarms were of
    no consequence

    View Slide

  171. 30% of the 8975
    alarms were
    false positives

    View Slide

  172. View Slide

  173. How can we
    improve?

    View Slide

  174. Provide more context

    View Slide

  175. View Slide

  176. View Slide

  177. View Slide

  178. View Slide

  179. Don't suggest, expose

    View Slide

  180. View Slide

  181. Reduce notifications

    View Slide

  182. View Slide

  183. No notifications
    on individual checks

    View Slide

  184. Notify on the
    aggregate

    View Slide

  185. check_check

    View Slide

  186. $ check_check.rb -s solrserver
    OK=27 WARNING=0 CRITICAL=1 UNKNOWN=0 services=/solrserver/ hosts=//
    Services in CRITICAL:
    frontend1.example.com => solrserver client tests

    View Slide

  187. Riemann's
    event grouping
    http://riemann.io/howto.html#group-events-in-time

    View Slide

  188. Don't startle
    the operator

    View Slide

  189. View Slide

  190. Rollup

    View Slide

  191. Limit alerts that
    are emitted

    View Slide

  192. Aggregate alerts
    together

    View Slide

  193. Incident response:

    View Slide

  194. Brute force:
    manual silence

    View Slide

  195. limit # of engineers who
    watch alerts
    & graphs

    View Slide

  196. Alerting system

    View Slide

  197. Flapjack

    View Slide

  198. Delay-based
    notification

    View Slide

  199. Per-media rollup
    threshold

    View Slide

  200. Don't startle
    the operator

    View Slide

  201. Granular
    alerting levels

    View Slide

  202. Alerta

    View Slide

  203. github.com/guardian/alerta/wiki/Alert-Format
    Alerta alerting levels

    View Slide

  204. Nagios alerting levels

    View Slide


  205. a

    b

    c

    d

    q
    n
    e
    f
    h
    i
    j
    k
    l
    m
    o
    p
    r
    s
    t
    u
    v
    w
    x
    y
    z
    g
    @abestanway's talk: https://speakerdeck.com/astanway/mom-my-algorithms-suck

    View Slide


  206. a

    b

    c

    d

    q
    n
    e
    f
    h
    i
    j
    k
    l
    m
    o
    p
    r
    s
    t
    u
    v
    w
    x
    y
    z
    g
    @abestanway's talk: https://speakerdeck.com/astanway/mom-my-algorithms-suck

    View Slide


  207. a

    b

    c

    d

    q
    n
    e
    f
    h
    i
    j
    k
    l
    m
    o
    p
    r
    s
    t
    u
    v
    w
    x
    y
    z
    g
    we alerts now
    @abestanway's talk: https://speakerdeck.com/astanway/mom-my-algorithms-suck

    View Slide

  208. View Slide

  209. It's not all
    doom and gloom

    View Slide

  210. We are on the
    cutting edge

    View Slide

  211. http://www.flickr.com/photos/quinnanya/5646121120/sizes/l/
    pulse oximeter was used

    View Slide

  212. View Slide

  213. Don't startle
    the operator

    View Slide

  214. Don't suggest, expose

    View Slide

  215. We design alerts
    for humans

    View Slide

  216. Let's understand
    how humans think

    View Slide

  217. View Slide

  218. Thank you!

    View Slide

  219. Thank you!
    — the talk?
    Let @auxesis know!

    View Slide