Slide 1

Slide 1 text

Asbury Agile - October 2, 2105 @aemcknig Allison McKnight Crafting Performance Alerting Tools

Slide 2

Slide 2 text

Allison McKnight | #performance Performance at Etsy Allison McKnight | @aemcknig Lara Natalya Kristyn Allison Mike 2

Slide 3

Slide 3 text

Graph everything. 3

Slide 4

Slide 4 text

Allison McKnight | @aemcknig Graphing performance 4 backend time (ms)

Slide 5

Slide 5 text

5

Slide 6

Slide 6 text

6

Slide 7

Slide 7 text

Allison McKnight | @aemcknig We needed monitoring 7

Slide 8

Slide 8 text

Allison McKnight | @aemcknig Monitoring page performance with Nagios 8

Slide 9

Slide 9 text

Allison McKnight | @aemcknig fast and fine-tuned alerting 9 Nagios

Slide 10

Slide 10 text

Allison McKnight | @aemcknig check graphite data script 10 Nagios github.com/etsy/nagios_tools

Slide 11

Slide 11 text

Allison McKnight | @aemcknig Individual check for each service
 you’d like to monitor 11 Nagios

Slide 12

Slide 12 text

Allison McKnight | @aemcknig Individual thresholds for each page
 you’d like to monitor 12 Nagios

Slide 13

Slide 13 text

Creating tools

Slide 14

Slide 14 text

Allison McKnight | @aemcknig 14

Slide 15

Slide 15 text

Allison McKnight | @aemcknig 15

Slide 16

Slide 16 text

Allison McKnight | @aemcknig 16 recommended thresholds

Slide 17

Slide 17 text

Allison McKnight | @aemcknig 17 recommended thresholds

Slide 18

Slide 18 text

Allison McKnight | @aemcknig 18 recommended thresholds

Slide 19

Slide 19 text

Allison McKnight | @aemcknig Creating a tool to visualize our performance alerts
 helped us develop well-tuned alerts. 19

Slide 20

Slide 20 text

Allison McKnight | @aemcknig 20

Slide 21

Slide 21 text

Allison McKnight | @aemcknig 21

Slide 22

Slide 22 text

Allison McKnight | @aemcknig 22 Current value: 768.5,
 warn threshold: 750.0,
 critical threshold: 800.0

Slide 23

Slide 23 text

Allison McKnight | @aemcknig We needed alerts that helped us
 understand the problem. 23

Slide 24

Slide 24 text

Changing the alert format

Slide 25

Slide 25 text

Allison McKnight | @aemcknig a tool for adding context to Nagios alerts 25 Nagios Herald github.com/etsy/nagios-herald

Slide 26

Slide 26 text

Allison McKnight | @aemcknig 26

Slide 27

Slide 27 text

Allison McKnight | @aemcknig 27

Slide 28

Slide 28 text

Allison McKnight | @aemcknig 28

Slide 29

Slide 29 text

Allison McKnight | @aemcknig 29

Slide 30

Slide 30 text

Allison McKnight | @aemcknig 30 Allison McKnight | @aemcknig

Slide 31

Slide 31 text

Allison McKnight | @aemcknig 31 Allison McKnight | @aemcknig

Slide 32

Slide 32 text

Allison McKnight | @aemcknig 32 Allison McKnight | @aemcknig

Slide 33

Slide 33 text

Allison McKnight | @aemcknig Allison McKnight | @aemcknig 33

Slide 34

Slide 34 text

Allison McKnight | @aemcknig Allison McKnight | @aemcknig 34

Slide 35

Slide 35 text

Allison McKnight | @aemcknig Allison McKnight | @aemcknig 35

Slide 36

Slide 36 text

Allison McKnight | @aemcknig With dependencies, we receive
 only actionable alerts 36

Slide 37

Slide 37 text

Improving sleuthing tools

Slide 38

Slide 38 text

Allison McKnight | @aemcknig 38 Allison McKnight | @aemcknig 38 actual thresholds

Slide 39

Slide 39 text

Allison McKnight | @aemcknig 39 Allison McKnight | @aemcknig 39

Slide 40

Slide 40 text

Allison McKnight | @aemcknig 40 Allison McKnight | @aemcknig 40

Slide 41

Slide 41 text

Allison McKnight | @aemcknig 41 Allison McKnight | @aemcknig 41

Slide 42

Slide 42 text

Allison McKnight | @aemcknig 42 Allison McKnight | @aemcknig 42

Slide 43

Slide 43 text

Allison McKnight | @aemcknig 43 Allison McKnight | @aemcknig 43

Slide 44

Slide 44 text

Allison McKnight | @aemcknig led to easier and faster investigation 44 Improved context and alerting tools

Slide 45

Slide 45 text

#performance

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

#performance

Slide 48

Slide 48 text

No content

Slide 49

Slide 49 text

#payments

Slide 50

Slide 50 text

Allison McKnight | @aemcknig improved cross-team collaboration 50 Improved context and alerting tools

Slide 51

Slide 51 text

What’s next?

Slide 52

Slide 52 text

Allison McKnight | @aemcknig 52 Alerting on improvements

Slide 53

Slide 53 text

Allison McKnight | @aemcknig Adding more context
 (teams, alert history) 53

Slide 54

Slide 54 text

Allison McKnight | @aemcknig Better alert integration
 (alert other teams, alert recent deployers) 54

Slide 55

Slide 55 text

Allison McKnight | @aemcknig More comprehensive alerting
 (front-end, mobile, API) 55

Slide 56

Slide 56 text

Use context to improve your tools

Slide 57

Slide 57 text

57 57 Questions? www.etsy.com/shop/cateanevski Resources Open-source tools:
 github.com/etsy/nagios_tools
 github.com/etsy/nagios-herald
 @aemcknig Allison McKnight