Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Martyrs On Film: learning to hate the #oncallselfie

Martyrs On Film: learning to hate the #oncallselfie

Alice Goldfuss

May 22, 2017
Tweet

More Decks by Alice Goldfuss

Other Decks in Technology

Transcript

  1. Martyrs on Film

    View Slide

  2. Hi! I’m Alice
    I like systems and
    Twitter and tea.

    View Slide

  3. Hi! I’m Alice
    And not getting
    paged.

    View Slide

  4. View Slide

  5. View Slide

  6. Benefits of On-Call
    ● Hones troubleshooting skills
    ● Forces you to identify the weak points in your systems
    ● Teaches you what is and isn’t production-ready

    View Slide

  7. Team bonding

    View Slide

  8. View Slide

  9. View Slide

  10. View Slide

  11. View Slide

  12. View Slide

  13. View Slide

  14. View Slide

  15. PagerDuty

    View Slide

  16. View Slide

  17. View Slide

  18. PagerDuty

    View Slide

  19. PagerDuty
    New Year’s Eve

    View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. PagerDuty
    New Year’s Eve

    View Slide

  24. PagerDuty
    New Year’s Eve S3 Outage

    View Slide

  25. View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. Me

    View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. A totally normal on-call routine
    ● Don’t leave house except to commute to work
    ● Clear all non-work appointments
    ● Cook all meals beforehand
    ● Have soup on hand
    ● Don’t sleep

    View Slide

  35. View Slide

  36. View Slide

  37. View Slide

  38. View Slide

  39. Be heroes

    View Slide

  40. Prepare for battle

    View Slide

  41. Naomi Orwin
    Writer

    View Slide

  42. “Action scenes stop the plot.”
    - Naomi Orwin

    View Slide

  43. Pages stop the plot of your career

    View Slide

  44. View Slide

  45. Miserable on-call professionals
    ● Have terrible work/life balance
    ● Are supporting poorly-designed systems
    ● Feel powerless to solve problems
    ● Generally hate the role

    View Slide

  46. View Slide

  47. Red flag:
    too few owning too much

    View Slide

  48. Centralia infrastructure

    View Slide

  49. View Slide

  50. View Slide

  51. View Slide

  52. Red flag:
    bandaids

    View Slide

  53. View Slide

  54. ● Bump thresholds
    ● Snooze pages
    ● Delays

    View Slide

  55. Red flag:
    no visibility

    View Slide

  56. Systems visibility

    View Slide

  57. Team visibility

    View Slide

  58. Too many pages

    View Slide

  59. Average # of weekly pages during WORST on-call

    View Slide

  60. Average # of weekly pages during WORST on-call

    View Slide

  61. Average # of weekly pages during WORST on-call

    View Slide

  62. Average # of weekly pages during WORST on-call

    View Slide

  63. View Slide

  64. Average # of weekly pages during BEST on-call

    View Slide

  65. Average # of weekly pages during BEST on-call

    View Slide

  66. Average # of weekly pages during BEST on-call

    View Slide

  67. How do we get there?

    View Slide

  68. Notification cleanup

    View Slide

  69. Actionable alerts

    View Slide

  70. Actionable Alerts
    ● Something breaks
    ● Customers notice
    ● I am the best person to fix it
    ● I need to fix it immediately

    View Slide

  71. Cluster alerts

    View Slide

  72. View Slide

  73. Devs on-call

    View Slide

  74. View Slide

  75. “If a developer is good, being ‘on
    call’ just means having to fix other
    people’s problems and
    inconsequential stuff on Sat.”
    - dev on Twitter

    View Slide

  76. “Fixed this for him! ‘Put your
    developers on call. You’ll be
    surprised by how quickly they go
    work for someone that isn’t an
    #$%.’”
    - dev on Twitter

    View Slide

  77. “If your org has change control
    working properly, if code breaks,
    the jr sysadmin should simply roll
    back the update as documented.”
    - dev on Twitter

    View Slide

  78. The right tool

    View Slide

  79. Work together

    View Slide

  80. View Slide

  81. Start small

    View Slide

  82. View Slide

  83. Your people will burn out before
    your company does

    View Slide

  84. View Slide

  85. View Slide

  86. View Slide

  87. Where does that leave #oncallselfie?

    View Slide

  88. View Slide

  89. View Slide

  90. View Slide

  91. View Slide

  92. View Slide

  93. View Slide

  94. View Slide

  95. View Slide

  96. View Slide

  97. View Slide

  98. View Slide

  99. “Why are you getting paged so much?”

    View Slide

  100. Thanks!
    @alicegoldfuss
    Special Thanks:
    PagerDuty
    VictorOps
    oncallselfies.com
    All of you

    View Slide

  101. View Slide