Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Sociotechnical Path to High-Performing Teams

The Sociotechnical Path to High-Performing Teams

How to measure and build high-performing engineering teams, and why it starts with observability.


Charity Majors

April 17, 2020


  1. @mipsytipsy The Socio-Technical Path to ✨High-Performing✨ Teams Observability and the

    Glorious Future @mipsytipsy
  2. @mipsytipsy engineer/cofounder/CTO https://charity.wtf

  3. the fundamental building block by which we organize ourselves and

    coordinate and scale our labor. Teams.
  4. engineer

  5. The teams you join will define your career more than

    any other single factor.
  6. bad jobs can be bad in so, so many different

    ways… • harmful product • glorified the results of
 poor planning • alienated from coworkers • long commute • indifferent manager • cargo-culted the worst of 
 Silicon Valley startup culture • aging, obsolete tech • high operational toil • fragile, flappy systems • complacency • low eng skill level • command-and-control 
  7. autonomy, learning, high- achieving, learned from our mistakes, curious, responsibility,

    ownership, inspiring, camaraderie, pride, collaboration, career growth, rewarding, motivating manual labor, sacred cows, wasted effort, stale tech, ass-covering, fear, fiefdoms, excessive toil, command-and-control, cargo culting, enervating, discouraging, lethargy, indifference
  8. sociotechnical (n) “Technology is the sum of ways in which

    social groups construct the material objects of their civilizations. The things made are socially constructed just as much as technically constructed. The merging of these two things, construction and insight, is sociotechnology” — wikipedia if you change the tools people use, you can change how they behave and even who they are.
  9. sociotechnical (n) Values Practices Tools

  10. sociotechnical (n) Values Practices Tools

  11. they perform A high-performing team isn’t just fun to be

    on. Nice coworkers who mean well and work/life balance are a good start, but
  12. How well does YOUR team perform? https://services.google.com/fh/files/misc/state-of-devops-2019.pdf 4 key metrics.

  13. 1 — How frequently do you deploy? 2 — How

    long does it take for code to go live? 3 — How many of your deploys fail? 4 — How long does it take to recover from an outage? 5 — How often are you paged outside work hours?
  14. There is a wide gap between elite teams and the

    bottom 50%.
  15. Also, we waste a LOT of time. https://stripe.com/reports/developer-coefficient-2018 42%!!!

  16. It really, really, really, really, really pays off to be

    on a high performing team. Like REALLY.
  17. Q: What happens when an engineer from the elite yellow

    bubble joins a team in the blue bubble? A: Your productivity tends to rise (or fall) to the level of the team you join.
  18. Who is going to be the better engineer two years

    from now? 3000 deploys/year 9 outages/year 6 hours firefighting 5 deploys/year 65 outages/year firefighting: constant
  19. So how do we build high-performing teams? “Just hire the

    best engineers, and you’ll get the best team” Hire people who share your values and have the needed skills, and then the work of building a team can begin.
  20. High-performing teams are continuously iterating towards production excellence. The work

    consists of cultivating sociotechnical feedback loops but it begins with observability. Happier customers, happier teams.
  21. observability(n): “In control theory, observability is a measure of how

    well internal states of a system can be inferred from knowledge of its external outputs. The observability and controllability of a system are mathematical duals." — wikipedia
  22. Observability is not the same as monitoring. monitor your known-unknowns,

    instrument for observability into unknown-unknowns
  23. Can you understand what’s happening inside your systems, just by

    asking questions from the outside? Can you debug your code and its behavior using its output? Can you answer new questions without shipping new code? o11y for software engineers:
  24. You have an observable system when your team can quickly

    and reliably track down any new problem with no prior knowledge. For software engineers, this means being able to reason about your code, identify and fix bugs, and understand user experiences and behaviors ... via your instrumentation.
  25. Observability requirements… https://www.honeycomb.io/blog/so-you-want-to-build-an-observability-tool/ • High cardinality • High dimensionality •

    Exploratory, open-ended investigation based on raw events • Service Level Objectives. No preaggregation. • Based on arbitrarily-wide structured events with span support • No indexes, schemas, or predefined structure • Bundling the full context of the request across network hops • Metrics != observability. Unstructured logs != observability.
  26. 1. Resiliency to failure 2. High-quality code 3. Manage complexity

    and technical debt 4. Predictable releases 5. Understand user behavior https://www.honeycomb.io/wp-content/uploads/2019/06/Framework-for-an-Observability-Maturity-Model.pdf Observability Maturity Model … find your weakest category, and tackle that first
  27. 1. Resiliency to failure 2. High-quality code 3. Manage complexity

    and technical debt 4. Predictable releases 5. Understand user behavior
  28. 1. Resiliency to failure 2. High-quality code 3. Manage complexity

    and technical debt 4. Predictable releases 5. Understand user behavior
  29. 1. Resiliency to failure 2. High-quality code 3. Manage complexity

    and technical debt 4. Predictable releases 5. Understand user behavior
  30. 1. Resiliency to failure 2. High-quality code 3. Manage complexity

    and technical debt 4. Predictable releases 5. Understand user behavior
  31. 1. Resiliency to failure 2. High-quality code 3. Manage complexity

    and technical debt 4. Predictable releases 5. Understand user behavior
  32. Why are computers hard? Because we don't understand them And

    we keep shipping things anyway Our tools have rewarded guessing over debugging And vendors have happily misled you for $$$$ It’s time to fix this problem.
  33. • Ephemeral and dynamic • Far-flung and loosely coupled •

    Partitioned, sharded • Distributed and replicated • Containers, schedulers • Service registries • Polyglot persistence strategies • Autoscaled, multiple failover • Emergent behaviors • ... etc Complexity is soaring
  34. We don’t *know* what the questions are, all we have

    are unreliable symptoms or reports. Complexity is exploding everywhere, but our tools were designed for predictable worlds As soon as we know the question, we usually know the answer too.
  35. We used to be able to reason about our architecture.

    Not anymore. 2003 2013 Now we have to instrument for observability. or we are screwed
  36. Observability is the key to making the leap from known-unknowns

    to unknown- unknowns. unknown-unknowns known-unknowns monitoring observability
  37. kick-start the virtuous cycle of you build it, you own

    it instrumenting two steps in front of you as you build never accept a PR unless you can explain it if it breaks watch your code go out as it deploys is it working as intended? does anything look weird look through the lens of your instrumentation
  38. for extra fun … let’s examine the sociotechnical implications of

    the predominant architecture models of the past two decades: monoliths and microservices
  39. Monolith • THE database • THE application • Known-unknowns and

    mostly predictable failures • Many monitoring checks/paging alerts • "Flip a switch" to deploy, changes are big bang and binary (all on/all off) • Failures to be prevented • Production is to be feared • Debug by intuition and scar tissue of past outages • Canned dashboards, runbooks, playbooks • Deploys are scary • Masochistic on-call culture sociotechnical causes & effects
  40. Monolith • We built our systems like glass castles —

    a fragile, forbidding edifice that we could tightly control access to. • Very hostile to exploration or experimentation
  41. • Many storage systems, many services, many polyglot technologies •

    Unknown-unknowns dominate • Every alert is a novel question • Rich, flexible instrumentation • Few paging alerts, tied to SLOs and keying off user pain • A deploy is just the beginning of gaining confidence in your code • Failures are your friend • Production is where your users live, you should be in there too, watching them every day • Debug methodically by examining the evidence and following the clues • Inspect the full context of the event • Deploys are opportunities • On-call must be sustainable, humane sociotechnical causes & effects Microservices
  42. • Software ownership -- you build it, you run it

    • Robust, resilient, built for experimentation and testing in prod • Human scale, with guard rails for safety Microservices
  43. Here's the dirty little secret. The next generation of systems

    won't be built and run by burned out, exhausted people, or command-and-control teams just following orders. It can't be done. they've become too complicated. too hard.
  44. We can no longer fit these systems in our heads

    and reason about them -- if we try, we'll be outcompeted by teams who use proper tools. Our systems are emergent and unpredictable. We need more than just your logical brain; we need your full creative self.
  45. "I don't have time to invest in observability right now.

    Maybe later” You can't afford not to.
  46. where are we going?

  47. on call will be shared by everyone who writes code.

    on call must be not-terrible. invest in your deploys, democratize production curate feedback loops (don’t be scared by regulations)
  48. Your labor is a scarce and precious resource. Lend it

    to those who are worthy of it.
  49. we have an opportunity here to make things better let's

    do it <3
  50. Charity Majors @mipsytipsy