Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Here's Your Pager; Good Luck Have Fun!

Andy Fleener
September 11, 2019

Here's Your Pager; Good Luck Have Fun!

As a manager of a team that has On-Call responsibilities, I am personally accountable for ensuring engineers are prepared to receive pages. As a systems thinker, I’ve broken this down into many different layered approaches that leave room for adaptation in the face of new and emergent system behavior.

Ok, great… what does that mean? Troubleshooting problems within a complex system requires enormous amounts of context-specific expertise. It is next to impossible to provide adequate training to build that expertise. That expertise is created over time and I’ve found that learning on the job is the most effective way to build it.

Treat On-Call as a team sport; give an engineer the freedom to learn and grow. If it’s a team sport you can still get the best possible response out of an incident, learning along the way. How you run your On-Call on-boarding has a significant impact on creating this team sport mentality and affects team culture. If we build a team culture based on trust and learning, we can do so much better than “Good Luck Have Fun”.

Andy Fleener

September 11, 2019
Tweet

More Decks by Andy Fleener

Other Decks in Technology

Transcript

  1. @andyfleener I’M NEW VIEW SAFETY NERD • Humanist • Systems

    Thinker • Wannabe Resilience Engineering Enthusiast
  2. @andyfleener HUMAN OPERATORS HAVE DUAL ROLES: AS PRODUCERS & AS

    DEFENDERS AGAINST FAILURE. — Richard Cook MD, How Complex Systems Fail
  3. @andyfleener Baiyin Yang, Karen E. Watkins, Victoria J. Marsick, The

    Construct of the Learning Organization: Dimensions, Measurement, and Validation THE CONSTRUCT OF THE LEARNING ORGANIZATION
  4. @andyfleener “LEADERSHIP AS AN EMERGENT PROPERTY OF THE SYSTEM THAT

    MOVES THE ORGANIZATION FORWARD” — James Barker, PhD, Dalhousie University
  5. @andyfleener • YOU - THE ENGINEER • YOU - THE

    TEAM • YOU - THE ORG DEFINITIONS OF YOU
  6. @andyfleener TAKE CARE OF YOURSELF THE ENGINEER’S PERSPECTIVE • You

    are not a robot, you are a human • Clear your calendar or trade conflicts away • Understand your backup situation • Keep your laptop and phone charged! • Monitor your stress and energy levels closely
  7. @andyfleener ON-CALL CAN BE STRESSFUL. WHETHER YOU'RE GETTING HAMMERED WITH

    PAGES OR YOU ONLY GOT ONE (BUT IT WAS AT 3 AM) GETTING INTO A RELAXED AND SLEEPY FRAME OF MIND CAN BE DIFFICULT. — Alice Goldfuss, “The On-Call Handbook”
  8. @andyfleener “CHANGE A BASIC ASSUMPTION AND YOU HAVE CHANGED THE

    SYSTEM ITSELF.” — Eliyahu M. Goldratt, Essays on the Theory of Constraints
  9. @andyfleener BE CURIOUS THE ENGINEER’S PERSPECTIVE • Being uncomfortable with

    a system or set of systems is useful feedback, embrace it as an opportunity to learn. • Take notes - you may not have time to ask in the moment but use your notes and come back to it. • Read any material you can find on previous incidents! • Pair on normal every day work and ask lots of questions! • Don’t be afraid to make mistakes!
  10. @andyfleener “WHEN WE PUT TOO MUCH ENERGY INTO ELIMINATING MISTAKES,

    WE’RE LESS LIKELY TO GAIN INSIGHTS. HAVING INSIGHTS IS A DIFFERENT MATTER FROM PREVENTING MISTAKES.” — GARY KLEIN, Seeing What Others Don’t
  11. @andyfleener SEEK TO UNDERSTAND THE IMPACT THE ENGINEER’S PERSPECTIVE •

    What does this alert mean for our customers? • Who do I need to tell about this incident? • Is something currently sideways? Is it about to go sideways? Or is this thing just a dumpster fire?
  12. @andyfleener GAME DAYS TREAT GAMES AS PRODUCTION INCIDENTS BY FOLLOWING

    THE FULL INCIDENT RESPONSE LIFECYCLE THE ORG’S PERSPECTIVE
  13. @andyfleener NORMATIVE LANGUAGE “INSUFFICIENT SERVICE MONITORING OF DATABASE PERFORMANCE DURING

    THE DEPLOY RESULTED FROM EXPECTANCY, INCREASED WORKLOAD, FATIGUE AND AUTOMATION RELIANCE.” THE ORG’S PERSPECTIVE
  14. @andyfleener THESE IDEAS ARE NOT ALL MY OWN • How

    Complex Systems Fail, Richard Cook MD • The Construct of the Learning Organization: Dimensions, Measurement, and Validation, Baiyin Yang, Karen E. Watkins, Victoria J. Marsick • Leading a complex organization, James Barker PhD • “The On-Call Handbook”, Alice Goldfuss • Essays on the Theory of Constraints, Eliyahu M. Goldratt • Seeing What Others Don’t, Gary Klein PhD • Observability for Emerging Infra (what got you here won't get you there), Charity Majors • Three analytical traps in accident investigation, Johan Bergström PhD • The Field Guide to Understanding Human Error, Sidney Dekker PhD • The problem with counterfactuals, Lorin Hochstein PhD