Slide 1

Slide 1 text

Warning: This Talk Contains Content Known to the State of California to Reduce Alert Fatigue Aditya Mukerjee Observability Engineer at Stripe @chimeracoder

Slide 2

Slide 2 text

@chimeracoder

Slide 3

Slide 3 text

Why we can learn from clinical healthcare •Direct personal contact •Visibly high-stakes •Systems which are difficult to control @chimeracoder

Slide 4

Slide 4 text

Alert Fatigue and Decision Fatigue @chimeracoder

Slide 5

Slide 5 text

When the frequency or severity of alerts causes the responder either to ignore important alerts or make mistakes more frequently @chimeracoder Alert Fatigue

Slide 6

Slide 6 text

When the frequency or complexity of decision points causes a person to avoid decisions or make mistakes more frequently. @chimeracoder Decision Fatigue

Slide 7

Slide 7 text

Alert Fatigue deals with the observability of systems @chimeracoder Decision Fatigue deals with the controllability of systems

Slide 8

Slide 8 text

72-99% of clinical alarms are false positives @chimeracoder …but certain patterns of alerts and decisions contribute disproportionately to fatigue!

Slide 9

Slide 9 text

Four Steps to Reducing Alert Fatigue: STAT @chimeracoder (Supported, Trustworthy, Actionable, Triaged)

Slide 10

Slide 10 text

Supported •Who owns this monitor? •Who has the right or authority to change it? @chimeracoder

Slide 11

Slide 11 text

@chimeracoder An alerting system includes the people who participate in responding to alerts, not just the software that generates alerts

Slide 12

Slide 12 text

The person responding to an alert always has the right to change it, whether we realize it or not @chimeracoder

Slide 13

Slide 13 text

Responders must feel ownership over the end result @chimeracoder

Slide 14

Slide 14 text

Trustworthy • Do I trust this alert to notify me when a problem happens? • Do I trust this alert to stay silent when all is well? • Do I trust this alert to give me sufficient information to diagnose problems? @chimeracoder

Slide 15

Slide 15 text

Anomaly detection and opaque algorithms If you don’t understand why an alert is firing, you don’t understand whether it’s real or not @chimeracoder

Slide 16

Slide 16 text

When to use modeling for monitors •Does the model represent the interconnectedness of your systems? •Can the thresholds be adjusted? •Are the model parameters and outputs human-interpretable? @chimeracoder

Slide 17

Slide 17 text

Actionable •At most one decision required to respond •Alerts that are difficult to action become alerts that are ignored @chimeracoder

Slide 18

Slide 18 text

Making alerts more actionable “investigate”, “something”, “somewhere”, “someone” @chimeracoder Decision trees, interactive tooling, making the alerts specific

Slide 19

Slide 19 text

If it’s unclear who should be taking action, the alert is not actionable @chimeracoder

Slide 20

Slide 20 text

Triaged •Meticulously triage alerts •Alert type should reflect urgency •Urgency of alerts can change @chimeracoder

Slide 21

Slide 21 text

Steps for triaging • Commonly-understood tiers • Regular, periodic re-evaluation process @chimeracoder

Slide 22

Slide 22 text

What’s wrong with Prop 65 warnings? @chimeracoder

Slide 23

Slide 23 text

STAT is just the beginning @chimeracoder

Slide 24

Slide 24 text

Takeaways •Alert fatigue and decision fatigue deplete executive function •Tackle alert fatigue and decision fatigue in tandem •Use STAT as a quick check to evaluate alerting systems •Regularly re-evaluate your alerts and alerting systems @chimeracoder

Slide 25

Slide 25 text

Thank you! Aditya Mukerjee @chimeracoder