Allison McKnight | @aemcknig
Graphing performance
7
backend time (ms)
Slide 8
Slide 8 text
8
Slide 9
Slide 9 text
9
Slide 10
Slide 10 text
Allison McKnight | @aemcknig
We needed monitoring
10
Slide 11
Slide 11 text
Allison McKnight | @aemcknig 11
Slide 12
Slide 12 text
Allison McKnight | @aemcknig 12
Slide 13
Slide 13 text
Allison McKnight | @aemcknig
Regression report
13
• Didn’t catch small or slow-creep regressions
• Difficult to tune
• Additional investigation was required to verify and
understand regressions
• Alert fatigue
Slide 14
Slide 14 text
Allison McKnight | @aemcknig
What could we do here?
14
• Enforce better graph-watching during deploys
• Change alerting mechanism
• Create tools to help investigate regressions
• Change alert format
Slide 15
Slide 15 text
Allison McKnight | @aemcknig
What could we do here?
15
• Enforce better graph-watching during deploys
• Change alerting mechanism
• Create tools to help investigate regressions
• Change alert format
Slide 16
Slide 16 text
Changing the alerting mechanism
Slide 17
Slide 17 text
Allison McKnight | @aemcknig
Monitoring page performance with Nagios
17
Slide 18
Slide 18 text
Allison McKnight | @aemcknig
fast and fine-tuned alerting
18
Nagios