Martyrs On Film: learning to hate the #oncallselfie

Martyrs On Film: learning to hate the #oncallselfie

C7b0422e97da85aabf114cc8591a10a2?s=128

Alice Goldfuss

May 22, 2017
Tweet

Transcript

  1. Martyrs on Film

  2. Hi! I’m Alice I like systems and Twitter and tea.

  3. Hi! I’m Alice And not getting paged.

  4. None
  5. None
  6. Benefits of On-Call • Hones troubleshooting skills • Forces you

    to identify the weak points in your systems • Teaches you what is and isn’t production-ready
  7. Team bonding

  8. None
  9. None
  10. None
  11. None
  12. None
  13. None
  14. None
  15. PagerDuty

  16. None
  17. None
  18. PagerDuty

  19. PagerDuty New Year’s Eve

  20. None
  21. None
  22. None
  23. PagerDuty New Year’s Eve

  24. PagerDuty New Year’s Eve S3 Outage

  25. None
  26. None
  27. None
  28. None
  29. Me

  30. None
  31. None
  32. None
  33. None
  34. A totally normal on-call routine • Don’t leave house except

    to commute to work • Clear all non-work appointments • Cook all meals beforehand • Have soup on hand • Don’t sleep
  35. None
  36. None
  37. None
  38. None
  39. Be heroes

  40. Prepare for battle

  41. Naomi Orwin Writer

  42. “Action scenes stop the plot.” - Naomi Orwin

  43. Pages stop the plot of your career

  44. None
  45. Miserable on-call professionals • Have terrible work/life balance • Are

    supporting poorly-designed systems • Feel powerless to solve problems • Generally hate the role
  46. None
  47. Red flag: too few owning too much

  48. Centralia infrastructure

  49. None
  50. None
  51. None
  52. Red flag: bandaids

  53. None
  54. • Bump thresholds • Snooze pages • Delays

  55. Red flag: no visibility

  56. Systems visibility

  57. Team visibility

  58. Too many pages

  59. Average # of weekly pages during WORST on-call

  60. Average # of weekly pages during WORST on-call

  61. Average # of weekly pages during WORST on-call

  62. Average # of weekly pages during WORST on-call

  63. None
  64. Average # of weekly pages during BEST on-call

  65. Average # of weekly pages during BEST on-call

  66. Average # of weekly pages during BEST on-call

  67. How do we get there?

  68. Notification cleanup

  69. Actionable alerts

  70. Actionable Alerts • Something breaks • Customers notice • I

    am the best person to fix it • I need to fix it immediately
  71. Cluster alerts

  72. None
  73. Devs on-call

  74. None
  75. “If a developer is good, being ‘on call’ just means

    having to fix other people’s problems and inconsequential stuff on Sat.” - dev on Twitter
  76. “Fixed this for him! ‘Put your developers on call. You’ll

    be surprised by how quickly they go work for someone that isn’t an #$%.’” - dev on Twitter
  77. “If your org has change control working properly, if code

    breaks, the jr sysadmin should simply roll back the update as documented.” - dev on Twitter
  78. The right tool

  79. Work together

  80. None
  81. Start small

  82. None
  83. Your people will burn out before your company does

  84. None
  85. None
  86. None
  87. Where does that leave #oncallselfie?

  88. None
  89. None
  90. None
  91. None
  92. None
  93. None
  94. None
  95. None
  96. None
  97. None
  98. None
  99. “Why are you getting paged so much?”

  100. Thanks! @alicegoldfuss Special Thanks: PagerDuty VictorOps oncallselfies.com All of you

  101. None