Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Asbury Agile - October 2, 2105 @aemcknig Allison McKnight Crafting Performance Alerting Tools
Slide 2
Slide 2 text
Allison McKnight | #performance Performance at Etsy Allison McKnight | @aemcknig Lara Natalya Kristyn Allison Mike 2
Slide 3
Slide 3 text
Graph everything. 3
Slide 4
Slide 4 text
Allison McKnight | @aemcknig Graphing performance 4 backend time (ms)
Slide 5
Slide 5 text
5
Slide 6
Slide 6 text
6
Slide 7
Slide 7 text
Allison McKnight | @aemcknig We needed monitoring 7
Slide 8
Slide 8 text
Allison McKnight | @aemcknig Monitoring page performance with Nagios 8
Slide 9
Slide 9 text
Allison McKnight | @aemcknig fast and fine-tuned alerting 9 Nagios
Slide 10
Slide 10 text
Allison McKnight | @aemcknig check graphite data script 10 Nagios github.com/etsy/nagios_tools
Slide 11
Slide 11 text
Allison McKnight | @aemcknig Individual check for each service you’d like to monitor 11 Nagios
Slide 12
Slide 12 text
Allison McKnight | @aemcknig Individual thresholds for each page you’d like to monitor 12 Nagios
Slide 13
Slide 13 text
Creating tools
Slide 14
Slide 14 text
Allison McKnight | @aemcknig 14
Slide 15
Slide 15 text
Allison McKnight | @aemcknig 15
Slide 16
Slide 16 text
Allison McKnight | @aemcknig 16 recommended thresholds
Slide 17
Slide 17 text
Allison McKnight | @aemcknig 17 recommended thresholds
Slide 18
Slide 18 text
Allison McKnight | @aemcknig 18 recommended thresholds
Slide 19
Slide 19 text
Allison McKnight | @aemcknig Creating a tool to visualize our performance alerts helped us develop well-tuned alerts. 19
Slide 20
Slide 20 text
Allison McKnight | @aemcknig 20
Slide 21
Slide 21 text
Allison McKnight | @aemcknig 21
Slide 22
Slide 22 text
Allison McKnight | @aemcknig 22 Current value: 768.5, warn threshold: 750.0, critical threshold: 800.0
Slide 23
Slide 23 text
Allison McKnight | @aemcknig We needed alerts that helped us understand the problem. 23
Slide 24
Slide 24 text
Changing the alert format
Slide 25
Slide 25 text
Allison McKnight | @aemcknig a tool for adding context to Nagios alerts 25 Nagios Herald github.com/etsy/nagios-herald
Slide 26
Slide 26 text
Allison McKnight | @aemcknig 26
Slide 27
Slide 27 text
Allison McKnight | @aemcknig 27
Slide 28
Slide 28 text
Allison McKnight | @aemcknig 28
Slide 29
Slide 29 text
Allison McKnight | @aemcknig 29
Slide 30
Slide 30 text
Allison McKnight | @aemcknig 30 Allison McKnight | @aemcknig
Slide 31
Slide 31 text
Allison McKnight | @aemcknig 31 Allison McKnight | @aemcknig
Slide 32
Slide 32 text
Allison McKnight | @aemcknig 32 Allison McKnight | @aemcknig
Slide 33
Slide 33 text
Allison McKnight | @aemcknig Allison McKnight | @aemcknig 33
Slide 34
Slide 34 text
Allison McKnight | @aemcknig Allison McKnight | @aemcknig 34
Slide 35
Slide 35 text
Allison McKnight | @aemcknig Allison McKnight | @aemcknig 35
Slide 36
Slide 36 text
Allison McKnight | @aemcknig With dependencies, we receive only actionable alerts 36
Slide 37
Slide 37 text
Improving sleuthing tools
Slide 38
Slide 38 text
Allison McKnight | @aemcknig 38 Allison McKnight | @aemcknig 38 actual thresholds
Slide 39
Slide 39 text
Allison McKnight | @aemcknig 39 Allison McKnight | @aemcknig 39
Slide 40
Slide 40 text
Allison McKnight | @aemcknig 40 Allison McKnight | @aemcknig 40
Slide 41
Slide 41 text
Allison McKnight | @aemcknig 41 Allison McKnight | @aemcknig 41
Slide 42
Slide 42 text
Allison McKnight | @aemcknig 42 Allison McKnight | @aemcknig 42
Slide 43
Slide 43 text
Allison McKnight | @aemcknig 43 Allison McKnight | @aemcknig 43
Slide 44
Slide 44 text
Allison McKnight | @aemcknig led to easier and faster investigation 44 Improved context and alerting tools
Slide 45
Slide 45 text
#performance
Slide 46
Slide 46 text
No content
Slide 47
Slide 47 text
#performance
Slide 48
Slide 48 text
No content
Slide 49
Slide 49 text
#payments
Slide 50
Slide 50 text
Allison McKnight | @aemcknig improved cross-team collaboration 50 Improved context and alerting tools
Slide 51
Slide 51 text
What’s next?
Slide 52
Slide 52 text
Allison McKnight | @aemcknig 52 Alerting on improvements
Slide 53
Slide 53 text
Allison McKnight | @aemcknig Adding more context (teams, alert history) 53
Slide 54
Slide 54 text
Allison McKnight | @aemcknig Better alert integration (alert other teams, alert recent deployers) 54
Slide 55
Slide 55 text
Allison McKnight | @aemcknig More comprehensive alerting (front-end, mobile, API) 55
Slide 56
Slide 56 text
Use context to improve your tools
Slide 57
Slide 57 text
57 57 Questions? www.etsy.com/shop/cateanevski Resources Open-source tools: github.com/etsy/nagios_tools github.com/etsy/nagios-herald @aemcknig Allison McKnight