$30 off During Our Annual Pro Sale. View Details »

The Sociotechnical Path to High-Performing Teams

The Sociotechnical Path to High-Performing Teams

How to measure and build high-performing engineering teams, and why it starts with observability.

Charity Majors

April 17, 2020
Tweet

More Decks by Charity Majors

Other Decks in Technology

Transcript

  1. @mipsytipsy
    The Socio-Technical Path
    to ✨High-Performing✨ Teams
    Observability and the Glorious Future
    @mipsytipsy

    View Slide

  2. @mipsytipsy
    engineer/cofounder/CTO
    https://charity.wtf

    View Slide

  3. the fundamental building block
    by which we organize ourselves
    and coordinate and scale our labor.
    Teams.

    View Slide

  4. engineer

    View Slide

  5. The teams you join will define your career
    more than any other single factor.

    View Slide

  6. bad jobs can
    be bad in
    so, so many
    different ways…
    • harmful product
    • glorified the results of

    poor planning
    • alienated from coworkers
    • long commute
    • indifferent manager
    • cargo-culted the worst of 

    Silicon Valley startup culture
    • aging, obsolete tech
    • high operational toil
    • fragile, flappy systems
    • complacency
    • low eng skill level
    • command-and-control 

    leadership

    View Slide

  7. autonomy, learning, high-
    achieving, learned from
    our mistakes, curious,
    responsibility, ownership,
    inspiring, camaraderie,
    pride, collaboration, career
    growth, rewarding,
    motivating
    manual labor, sacred cows,
    wasted effort, stale tech,
    ass-covering, fear,
    fiefdoms, excessive toil,
    command-and-control,
    cargo culting, enervating,
    discouraging, lethargy,
    indifference

    View Slide

  8. sociotechnical (n)
    “Technology is the sum of ways in which social groups construct the
    material objects of their civilizations. The things made are socially
    constructed just as much as technically constructed. The merging of these
    two things, construction and insight, is sociotechnology” — wikipedia
    if you change the tools people use,
    you can change how they behave and even who they are.

    View Slide

  9. sociotechnical (n)
    Values
    Practices
    Tools

    View Slide

  10. sociotechnical (n)
    Values
    Practices
    Tools

    View Slide

  11. they perform
    A high-performing team isn’t just fun to be on.
    Nice coworkers who mean well and
    work/life balance are a good start, but

    View Slide

  12. How well does YOUR team perform?
    https://services.google.com/fh/files/misc/state-of-devops-2019.pdf
    4
    key
    metrics.

    View Slide

  13. 1 — How frequently do you deploy?
    2 — How long does it take for code to go live?
    3 — How many of your deploys fail?
    4 — How long does it take to recover from an outage?
    5 — How often are you paged outside work hours?

    View Slide

  14. There is a wide gap between elite teams and the bottom 50%.

    View Slide

  15. Also, we waste a LOT of time.
    https://stripe.com/reports/developer-coefficient-2018
    42%!!!

    View Slide

  16. It really, really, really,
    really, really
    pays off
    to be on a
    high performing team.
    Like REALLY.

    View Slide

  17. Q: What happens when an
    engineer from the elite yellow
    bubble joins a team in the blue
    bubble?
    A: Your productivity tends
    to rise (or fall) to the level of
    the team you join.

    View Slide

  18. Who is going to be the better engineer
    two years from now?
    3000 deploys/year
    9 outages/year
    6 hours firefighting
    5 deploys/year
    65 outages/year
    firefighting: constant

    View Slide

  19. So how do we build
    high-performing teams?
    “Just hire the best
    engineers, and you’ll
    get the best team”
    Hire people who share your values and have the needed skills,
    and then the work of building a team can begin.

    View Slide

  20. High-performing teams are continuously
    iterating towards production excellence.
    The work consists of cultivating sociotechnical feedback loops
    but it begins with observability.
    Happier customers, happier teams.

    View Slide

  21. observability(n):
    “In control theory, observability is a measure of how well internal
    states of a system can be inferred from knowledge of its external
    outputs. The observability and controllability of a system are
    mathematical duals." — wikipedia

    View Slide

  22. Observability is not the same as monitoring.
    monitor your known-unknowns,
    instrument for observability into unknown-unknowns

    View Slide

  23. Can you understand what’s happening inside your
    systems, just by asking questions from the outside? Can
    you debug your code and its behavior using its output?
    Can you answer new questions without shipping new code?
    o11y for software engineers:

    View Slide

  24. You have an observable system
    when your team can quickly and reliably track down
    any new problem with no prior knowledge.
    For software engineers, this means being able to
    reason about your code, identify and fix bugs, and
    understand user experiences and behaviors ...
    via your instrumentation.

    View Slide

  25. Observability requirements…
    https://www.honeycomb.io/blog/so-you-want-to-build-an-observability-tool/
    • High cardinality
    • High dimensionality
    • Exploratory, open-ended investigation based on raw events
    • Service Level Objectives. No preaggregation.
    • Based on arbitrarily-wide structured events with span support
    • No indexes, schemas, or predefined structure
    • Bundling the full context of the request across network hops
    • Metrics != observability. Unstructured logs != observability.

    View Slide

  26. 1. Resiliency to failure
    2. High-quality code
    3. Manage complexity and technical debt
    4. Predictable releases
    5. Understand user behavior
    https://www.honeycomb.io/wp-content/uploads/2019/06/Framework-for-an-Observability-Maturity-Model.pdf
    Observability Maturity Model
    … find your weakest category, and tackle that first

    View Slide

  27. 1. Resiliency to failure
    2. High-quality code
    3. Manage complexity and technical debt
    4. Predictable releases
    5. Understand user behavior

    View Slide

  28. 1. Resiliency to failure
    2. High-quality code
    3. Manage complexity and technical debt
    4. Predictable releases
    5. Understand user behavior

    View Slide

  29. 1. Resiliency to failure
    2. High-quality code
    3. Manage complexity and technical debt
    4. Predictable releases
    5. Understand user behavior

    View Slide

  30. 1. Resiliency to failure
    2. High-quality code
    3. Manage complexity and technical debt
    4. Predictable releases
    5. Understand user behavior

    View Slide

  31. 1. Resiliency to failure
    2. High-quality code
    3. Manage complexity and technical debt
    4. Predictable releases
    5. Understand user behavior

    View Slide

  32. Why are computers hard?
    Because we don't understand them
    And we keep shipping things anyway
    Our tools have rewarded guessing over debugging
    And vendors have happily misled you for $$$$
    It’s time to fix this problem.

    View Slide

  33. • Ephemeral and dynamic
    • Far-flung and loosely coupled
    • Partitioned, sharded
    • Distributed and replicated
    • Containers, schedulers
    • Service registries
    • Polyglot persistence strategies
    • Autoscaled, multiple failover
    • Emergent behaviors
    • ... etc
    Complexity is soaring

    View Slide

  34. We don’t *know* what the questions are, all
    we have are unreliable symptoms or reports.
    Complexity is exploding everywhere,
    but our tools were designed
    for predictable worlds
    As soon as we know the question, we usually
    know the answer too.

    View Slide

  35. We used to be able to reason about our
    architecture. Not anymore.
    2003 2013
    Now we have to instrument for observability.
    or we are screwed

    View Slide

  36. Observability is the key to making the leap
    from known-unknowns to unknown-
    unknowns.
    unknown-unknowns
    known-unknowns
    monitoring observability

    View Slide

  37. kick-start the virtuous cycle of you build it, you own it
    instrumenting two steps in front of you as you build
    never accept a PR unless you can explain it if it breaks
    watch your code go out as it deploys
    is it working as intended? does anything look weird
    look through the lens of your instrumentation

    View Slide

  38. for extra fun … let’s examine the sociotechnical implications of the
    predominant architecture models of the past two decades:
    monoliths and microservices

    View Slide

  39. Monolith
    • THE database
    • THE application
    • Known-unknowns and mostly
    predictable failures
    • Many monitoring checks/paging
    alerts
    • "Flip a switch" to deploy, changes are
    big bang and binary (all on/all off)
    • Failures to be prevented
    • Production is to be feared
    • Debug by intuition and scar tissue of
    past outages
    • Canned dashboards, runbooks,
    playbooks
    • Deploys are scary
    • Masochistic on-call culture
    sociotechnical causes & effects

    View Slide

  40. Monolith
    • We built our systems like glass castles
    — a fragile, forbidding edifice that we
    could tightly control access to.
    • Very hostile to exploration or
    experimentation

    View Slide

  41. • Many storage systems, many services,
    many polyglot technologies
    • Unknown-unknowns dominate
    • Every alert is a novel question
    • Rich, flexible instrumentation
    • Few paging alerts, tied to SLOs and
    keying off user pain
    • A deploy is just the beginning of
    gaining confidence in your code
    • Failures are your friend
    • Production is where your users live,
    you should be in there too, watching
    them every day
    • Debug methodically by examining the
    evidence and following the clues
    • Inspect the full context of the event
    • Deploys are opportunities
    • On-call must be sustainable, humane
    sociotechnical causes & effects
    Microservices

    View Slide

  42. • Software ownership -- you build it,
    you run it
    • Robust, resilient, built for
    experimentation and testing in prod
    • Human scale, with guard rails for
    safety
    Microservices

    View Slide

  43. Here's the dirty little secret.
    The next generation of systems won't be built and run by burned out, exhausted
    people, or command-and-control teams just following orders.
    It can't be done.
    they've become too complicated. too hard.

    View Slide

  44. We can no longer fit these systems in our heads
    and reason about them -- if we try, we'll be
    outcompeted by teams who use proper tools.
    Our systems are emergent and unpredictable. We need more than
    just your logical brain; we need your full creative self.

    View Slide

  45. "I don't have time to invest in observability right now. Maybe later”
    You can't afford not to.

    View Slide

  46. where
    are we
    going?

    View Slide

  47. on call will be shared by everyone who writes code.
    on call must be not-terrible.
    invest in your deploys, democratize production
    curate feedback loops
    (don’t be scared by regulations)

    View Slide

  48. Your labor is a scarce and precious resource. Lend
    it to those who are worthy of it.

    View Slide

  49. we have an opportunity here to make things better
    let's do it <3

    View Slide

  50. Charity Majors
    @mipsytipsy

    View Slide