Asbury Agile: Crafting Performance Alerting Tools

Asbury Agile: Crafting Performance Alerting Tools

Etsy has recently developed new alerting tools to help discover and dig into performance regressions across the site. In this presentation, Allison McKnight, performance engineer at Etsy, covers we built these tools on top of existing technology, how we iterated on the context included in our alerts and developed tools to help us investigate regressions, and how these tools have changed the way that we collaborate with other teams to fix performance regressions.

C5ca01974effba0b394a7f54f26747ea?s=128

Allison McKnight

October 05, 2015
Tweet

Transcript

  1. Asbury Agile - October 2, 2105 @aemcknig Allison McKnight Crafting

    Performance Alerting Tools
  2. Allison McKnight | #performance Performance at Etsy Allison McKnight |

    @aemcknig Lara Natalya Kristyn Allison Mike 2
  3. Graph everything. 3

  4. Allison McKnight | @aemcknig Graphing performance 4 backend time (ms)

  5. 5

  6. 6

  7. Allison McKnight | @aemcknig We needed monitoring 7

  8. Allison McKnight | @aemcknig Monitoring page performance with Nagios 8

  9. Allison McKnight | @aemcknig fast and fine-tuned alerting 9 Nagios

  10. Allison McKnight | @aemcknig check graphite data script 10 Nagios

    github.com/etsy/nagios_tools
  11. Allison McKnight | @aemcknig Individual check for each service
 you’d

    like to monitor 11 Nagios
  12. Allison McKnight | @aemcknig Individual thresholds for each page
 you’d

    like to monitor 12 Nagios
  13. Creating tools

  14. Allison McKnight | @aemcknig 14

  15. Allison McKnight | @aemcknig 15

  16. Allison McKnight | @aemcknig 16 recommended thresholds

  17. Allison McKnight | @aemcknig 17 recommended thresholds

  18. Allison McKnight | @aemcknig 18 recommended thresholds

  19. Allison McKnight | @aemcknig Creating a tool to visualize our

    performance alerts
 helped us develop well-tuned alerts. 19
  20. Allison McKnight | @aemcknig 20

  21. Allison McKnight | @aemcknig 21

  22. Allison McKnight | @aemcknig 22 Current value: 768.5,
 warn threshold:

    750.0,
 critical threshold: 800.0
  23. Allison McKnight | @aemcknig We needed alerts that helped us


    understand the problem. 23
  24. Changing the alert format

  25. Allison McKnight | @aemcknig a tool for adding context to

    Nagios alerts 25 Nagios Herald github.com/etsy/nagios-herald
  26. Allison McKnight | @aemcknig 26

  27. Allison McKnight | @aemcknig 27

  28. Allison McKnight | @aemcknig 28

  29. Allison McKnight | @aemcknig 29

  30. Allison McKnight | @aemcknig 30 Allison McKnight | @aemcknig

  31. Allison McKnight | @aemcknig 31 Allison McKnight | @aemcknig

  32. Allison McKnight | @aemcknig 32 Allison McKnight | @aemcknig

  33. Allison McKnight | @aemcknig Allison McKnight | @aemcknig 33

  34. Allison McKnight | @aemcknig Allison McKnight | @aemcknig 34

  35. Allison McKnight | @aemcknig Allison McKnight | @aemcknig 35

  36. Allison McKnight | @aemcknig With dependencies, we receive
 only actionable

    alerts 36
  37. Improving sleuthing tools

  38. Allison McKnight | @aemcknig 38 Allison McKnight | @aemcknig 38

    actual thresholds
  39. Allison McKnight | @aemcknig 39 Allison McKnight | @aemcknig 39

  40. Allison McKnight | @aemcknig 40 Allison McKnight | @aemcknig 40

  41. Allison McKnight | @aemcknig 41 Allison McKnight | @aemcknig 41

  42. Allison McKnight | @aemcknig 42 Allison McKnight | @aemcknig 42

  43. Allison McKnight | @aemcknig 43 Allison McKnight | @aemcknig 43

  44. Allison McKnight | @aemcknig led to easier and faster investigation

    44 Improved context and alerting tools
  45. #performance

  46. None
  47. #performance

  48. None
  49. #payments

  50. Allison McKnight | @aemcknig improved cross-team collaboration 50 Improved context

    and alerting tools
  51. What’s next?

  52. Allison McKnight | @aemcknig 52 Alerting on improvements

  53. Allison McKnight | @aemcknig Adding more context
 (teams, alert history)

    53
  54. Allison McKnight | @aemcknig Better alert integration
 (alert other teams,

    alert recent deployers) 54
  55. Allison McKnight | @aemcknig More comprehensive alerting
 (front-end, mobile, API)

    55
  56. Use context to improve your tools

  57. 57 57 Questions? www.etsy.com/shop/cateanevski Resources Open-source tools:
 github.com/etsy/nagios_tools
 github.com/etsy/nagios-herald
 @aemcknig

    Allison McKnight