Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Humane Treatment of On-Call Engineers

Humane Treatment of On-Call Engineers

Exploring what sucks in the On Call process and how to fix it.

Avatar for Aaron Aldrich

Aaron Aldrich

August 18, 2016
Tweet

More Decks by Aaron Aldrich

Other Decks in Technology

Transcript

  1. Aaron Aldrich @CrayZeigh Cage Data, Inc. @CageData ON-CALL, WHY IT

    SUCKS AND HOW WE CAN MAKE IT A LITTLE BETTER. DEVOPS CT: BURNOUT
  2. THANKS! AARON ALDRICH ▸ @CrayZeigh ▸ [email protected] ▸ Cage Data,

    Inc. ▸ @CageData ▸ Part of On Call rotation since 2009 through 3 different Orgs ▸ Husband, Father of 4, Musician, Board Gamer, Team Mystic
  3. Aaron Aldrich - @CrayZeigh OVERALL CONTRIBUTORS TO BURNOUT ▸ Internal

    management Practice ▸ Change/Deployment Environment ▸ Organizational Culture ▸ Investment in Technology
  4. Aaron Aldrich - @CrayZeigh Humans can’t work all the time.

    We need food and sleep and free time.
  5. Aaron Aldrich - @CrayZeigh On-Call should be thought of as

    an Emergency Response Team for your technology assets
  6. Aaron Aldrich - @CrayZeigh Being on call should mean something

    more than just being woken up in the middle of the night.
  7. Aaron Aldrich - @CrayZeigh ACTIONABLE ALERTS ▸Something needs to be

    done ▸That something needs to be done now ▸The person receiving the alert is empowered to act upon it
  8. Aaron Aldrich - @CrayZeigh Audible alarms are not only incredibly

    annoying, but almost entirely useless: they do not reduce crime or deter thieves, and they produce an overall net negative impact on society.
  9. Aaron Aldrich - @CrayZeigh IT SOUNDS PLAUSIBLE ENOUGH TONIGHT, BUT

    WAIT UNTIL TOMORROW. WAIT FOR THE COMMON SENSE OF THE MORNING. HG Wells, The Time Machine
  10. Aaron Aldrich - @CrayZeigh Does it Matter? Can it wait?

    Is the right person getting the call?
  11. Aaron Aldrich - @CrayZeigh RUNBOOKS TELL US ▸ What’s impacted

    ▸ How to connect with it ▸ Clear Steps on what to do to fix a problem ▸ If it doesn’t fix the problem, where to begin next ▸ And/Or How to call in backup WHAT THE ALERT MEANS AND HOW TO ACT ON IT
  12. Aaron Aldrich - @CrayZeigh Any automated fix to a persistent

    problem should exist as a stop-gap only. Reducing unplanned work in order to plan on finding a solution.
  13. Aaron Aldrich - @CrayZeigh It’s important that the on-call personnel

    are granted the authority to call in for backup
  14. Aaron Aldrich - @CrayZeigh We all tell stories, and it

    takes a level of struggle to make that story worth telling.
  15. Aaron Aldrich - @CrayZeigh Swapping stories about trying times helps

    deal with the stress, but an environment that does nothing to avoid them causes problems in the long run.
  16. Aaron Aldrich - @CrayZeigh SOFTWARE DEVELOPMENT, IT TURNS OUT, IS

    A TEAM SPORT… AND WHAT’S WORSE, ENCOURAGING THE HERO MENTALITY LEADS TO CORROSIVE DYSFUNCTION IN SOFTWARE TEAMS. Rob Mee, Pivotal Labs
  17. Aaron Aldrich - @CrayZeigh COMPENSATION FOR EXTRA UNPLANNED WORK ▸

    Working after-hours should be the exception not the norm ▸ When it happens you deserve to get compensated for your time ▸ Large Organizations: ▸ Actual real monetary compensation ▸ Extra vacation time ▸ Company covers human fuel costs during long extra-workday hours ▸ Small Organizations & Startups: ▸ May be more baked into the culture ▸ Flexible hours, with expectations of taking time off ▸ Fostering a real culture of appreciating your employees
  18. Aaron Aldrich - @CrayZeigh WHAT DOES A POSTMORTEM DO? ▸

    What happened? ▸ Create a timeline of events ▸ What went well? ▸ What should we keep doing? ▸ What went poorly? ▸ How can we improve?
  19. Aaron Aldrich - @CrayZeigh WHAT IS A BLAMELESS POSTMORTEM? ▸

    Team members are accountable but not responsible ▸ Complete Transparency ▸ Deeper look at circumstances ▸ What happened and how to improve it (specific details) ▸ Real conditions of failure in complex systems @jasonhand
  20. Aaron Aldrich - @CrayZeigh WHY DO WE DO IT? ▸

    Be aware of cognitive biases, especially hindsight ▸ Encourage & note divergent views ▸ The goal is to learn how to improve, not create a scapegoat ▸ Any tasks that effect quality of life for your employees should be prioritized first LEARNING RETROSPECTIVES