Warning: This Talk Contains Content
Known to the State of California to Reduce
Alert Fatigue
Aditya Mukerjee
Observability Engineer at Stripe
@chimeracoder
Slide 2
Slide 2 text
@chimeracoder
Slide 3
Slide 3 text
Why we can learn from clinical healthcare
•Direct personal contact
•Visibly high-stakes
•Systems which are difficult to control
@chimeracoder
Slide 4
Slide 4 text
Alert Fatigue and Decision Fatigue
@chimeracoder
Slide 5
Slide 5 text
When the frequency or severity of alerts causes the responder
either to ignore important alerts or make mistakes more frequently
@chimeracoder
Alert Fatigue
Slide 6
Slide 6 text
When the frequency or complexity of decision points causes a
person to avoid decisions or make mistakes more frequently.
@chimeracoder
Decision Fatigue
Slide 7
Slide 7 text
Alert Fatigue deals with the observability of systems
@chimeracoder
Decision Fatigue deals with the controllability of systems
Slide 8
Slide 8 text
72-99% of clinical alarms are false positives
@chimeracoder
…but certain patterns of alerts and decisions contribute
disproportionately to fatigue!
Slide 9
Slide 9 text
Four Steps to Reducing Alert Fatigue: STAT
@chimeracoder
(Supported, Trustworthy, Actionable, Triaged)
Slide 10
Slide 10 text
Supported
•Who owns this monitor?
•Who has the right or authority to change it?
@chimeracoder
Slide 11
Slide 11 text
@chimeracoder
An alerting system includes the people who participate in
responding to alerts, not just the software that generates alerts
Slide 12
Slide 12 text
The person responding to an alert always has the right to
change it, whether we realize it or not
@chimeracoder
Slide 13
Slide 13 text
Responders must feel ownership over the end result
@chimeracoder
Slide 14
Slide 14 text
Trustworthy
• Do I trust this alert to notify me when a problem happens?
• Do I trust this alert to stay silent when all is well?
• Do I trust this alert to give me sufficient information to diagnose problems?
@chimeracoder
Slide 15
Slide 15 text
Anomaly detection and opaque algorithms
If you don’t understand why an alert is firing, you don’t understand
whether it’s real or not
@chimeracoder
Slide 16
Slide 16 text
When to use modeling for monitors
•Does the model represent the interconnectedness of your systems?
•Can the thresholds be adjusted?
•Are the model parameters and outputs human-interpretable?
@chimeracoder
Slide 17
Slide 17 text
Actionable
•At most one decision required to respond
•Alerts that are difficult to action become alerts that are ignored
@chimeracoder
Slide 18
Slide 18 text
Making alerts more actionable
“investigate”, “something”, “somewhere”, “someone”
@chimeracoder
Decision trees, interactive tooling, making the alerts specific
Slide 19
Slide 19 text
If it’s unclear who should be taking action, the alert is not actionable
@chimeracoder
Slide 20
Slide 20 text
Triaged
•Meticulously triage alerts
•Alert type should reflect urgency
•Urgency of alerts can change
@chimeracoder
Slide 21
Slide 21 text
Steps for triaging
• Commonly-understood tiers
• Regular, periodic re-evaluation process
@chimeracoder
Slide 22
Slide 22 text
What’s wrong with Prop 65 warnings?
@chimeracoder
Slide 23
Slide 23 text
STAT is just the beginning
@chimeracoder
Slide 24
Slide 24 text
Takeaways
•Alert fatigue and decision fatigue deplete executive function
•Tackle alert fatigue and decision fatigue in tandem
•Use STAT as a quick check to evaluate alerting systems
•Regularly re-evaluate your alerts and alerting systems
@chimeracoder