The Sociotechnical Path to High-Performing Teams

@mipsytipsy The Socio-Technical Path to ✨High-Performing✨ Teams Observability and the
Glorious Future @mipsytipsy

@mipsytipsy engineer/cofounder/CTO https://charity.wtf

the fundamental building block by which we organize ourselves and
coordinate and scale our labor. Teams.

engineer

The teams you join will deﬁne your career more than
any other single factor.

bad jobs can be bad in so, so many different
ways… • harmful product • gloriﬁed the results of  poor planning • alienated from coworkers • long commute • indifferent manager • cargo-culted the worst of   Silicon Valley startup culture • aging, obsolete tech • high operational toil • fragile, ﬂappy systems • complacency • low eng skill level • command-and-control   leadership

autonomy, learning, high- achieving, learned from our mistakes, curious, responsibility,
ownership, inspiring, camaraderie, pride, collaboration, career growth, rewarding, motivating manual labor, sacred cows, wasted effort, stale tech, ass-covering, fear, ﬁefdoms, excessive toil, command-and-control, cargo culting, enervating, discouraging, lethargy, indifference

sociotechnical (n) “Technology is the sum of ways in which
social groups construct the material objects of their civilizations. The things made are socially constructed just as much as technically constructed. The merging of these two things, construction and insight, is sociotechnology” — wikipedia if you change the tools people use, you can change how they behave and even who they are.

sociotechnical (n) Values Practices Tools

they perform A high-performing team isn’t just fun to be
on. Nice coworkers who mean well and work/life balance are a good start, but

How well does YOUR team perform? https://services.google.com/fh/ﬁles/misc/state-of-devops-2019.pdf 4 key metrics.

1 — How frequently do you deploy? 2 — How
long does it take for code to go live? 3 — How many of your deploys fail? 4 — How long does it take to recover from an outage? 5 — How often are you paged outside work hours?

There is a wide gap between elite teams and the
bottom 50%.

Also, we waste a LOT of time. https://stripe.com/reports/developer-coefﬁcient-2018 42%!!!

It really, really, really, really, really pays off to be
on a high performing team. Like REALLY.

Q: What happens when an engineer from the elite yellow
bubble joins a team in the blue bubble? A: Your productivity tends to rise (or fall) to the level of the team you join.

Who is going to be the better engineer two years
from now? 3000 deploys/year 9 outages/year 6 hours firefighting 5 deploys/year 65 outages/year firefighting: constant

So how do we build high-performing teams? “Just hire the
best engineers, and you’ll get the best team” Hire people who share your values and have the needed skills, and then the work of building a team can begin.

High-performing teams are continuously iterating towards production excellence. The work
consists of cultivating sociotechnical feedback loops but it begins with observability. Happier customers, happier teams.

observability(n): “In control theory, observability is a measure of how
well internal states of a system can be inferred from knowledge of its external outputs. The observability and controllability of a system are mathematical duals." — wikipedia

Observability is not the same as monitoring. monitor your known-unknowns,
instrument for observability into unknown-unknowns

Can you understand what’s happening inside your systems, just by
asking questions from the outside? Can you debug your code and its behavior using its output? Can you answer new questions without shipping new code? o11y for software engineers:

You have an observable system when your team can quickly
and reliably track down any new problem with no prior knowledge. For software engineers, this means being able to reason about your code, identify and ﬁx bugs, and understand user experiences and behaviors ... via your instrumentation.

Observability requirements… https://www.honeycomb.io/blog/so-you-want-to-build-an-observability-tool/ • High cardinality • High dimensionality •
Exploratory, open-ended investigation based on raw events • Service Level Objectives. No preaggregation. • Based on arbitrarily-wide structured events with span support • No indexes, schemas, or predeﬁned structure • Bundling the full context of the request across network hops • Metrics != observability. Unstructured logs != observability.

1. Resiliency to failure 2. High-quality code 3. Manage complexity
and technical debt 4. Predictable releases 5. Understand user behavior https://www.honeycomb.io/wp-content/uploads/2019/06/Framework-for-an-Observability-Maturity-Model.pdf Observability Maturity Model … ﬁnd your weakest category, and tackle that ﬁrst

1. Resiliency to failure 2. High-quality code 3. Manage complexity
and technical debt 4. Predictable releases 5. Understand user behavior

Why are computers hard? Because we don't understand them And
we keep shipping things anyway Our tools have rewarded guessing over debugging And vendors have happily misled you for $$$$ It’s time to ﬁx this problem.

• Ephemeral and dynamic • Far-ﬂung and loosely coupled •
Partitioned, sharded • Distributed and replicated • Containers, schedulers • Service registries • Polyglot persistence strategies • Autoscaled, multiple failover • Emergent behaviors • ... etc Complexity is soaring

We don’t *know* what the questions are, all we have
are unreliable symptoms or reports. Complexity is exploding everywhere, but our tools were designed for predictable worlds As soon as we know the question, we usually know the answer too.

We used to be able to reason about our architecture.
Not anymore. 2003 2013 Now we have to instrument for observability. or we are screwed

Observability is the key to making the leap from known-unknowns
to unknown- unknowns. unknown-unknowns known-unknowns monitoring observability

kick-start the virtuous cycle of you build it, you own
it instrumenting two steps in front of you as you build never accept a PR unless you can explain it if it breaks watch your code go out as it deploys is it working as intended? does anything look weird look through the lens of your instrumentation

for extra fun … let’s examine the sociotechnical implications of
the predominant architecture models of the past two decades: monoliths and microservices

Monolith • THE database • THE application • Known-unknowns and
mostly predictable failures • Many monitoring checks/paging alerts • "Flip a switch" to deploy, changes are big bang and binary (all on/all off) • Failures to be prevented • Production is to be feared • Debug by intuition and scar tissue of past outages • Canned dashboards, runbooks, playbooks • Deploys are scary • Masochistic on-call culture sociotechnical causes & effects

Monolith • We built our systems like glass castles —
a fragile, forbidding ediﬁce that we could tightly control access to. • Very hostile to exploration or experimentation

• Many storage systems, many services, many polyglot technologies •
Unknown-unknowns dominate • Every alert is a novel question • Rich, ﬂexible instrumentation • Few paging alerts, tied to SLOs and keying off user pain • A deploy is just the beginning of gaining conﬁdence in your code • Failures are your friend • Production is where your users live, you should be in there too, watching them every day • Debug methodically by examining the evidence and following the clues • Inspect the full context of the event • Deploys are opportunities • On-call must be sustainable, humane sociotechnical causes & effects Microservices

• Software ownership -- you build it, you run it
• Robust, resilient, built for experimentation and testing in prod • Human scale, with guard rails for safety Microservices

Here's the dirty little secret. The next generation of systems
won't be built and run by burned out, exhausted people, or command-and-control teams just following orders. It can't be done. they've become too complicated. too hard.

We can no longer ﬁt these systems in our heads
and reason about them -- if we try, we'll be outcompeted by teams who use proper tools. Our systems are emergent and unpredictable. We need more than just your logical brain; we need your full creative self.

"I don't have time to invest in observability right now.
Maybe later” You can't afford not to.

where are we going?

on call will be shared by everyone who writes code.
on call must be not-terrible. invest in your deploys, democratize production curate feedback loops (don’t be scared by regulations)

Your labor is a scarce and precious resource. Lend it
to those who are worthy of it.

we have an opportunity here to make things better let's
do it <3

Charity Majors @mipsytipsy

The Sociotechnical Path to High-Performing Teams

The Sociotechnical Path to High-Performing Teams

More Decks by Charity Majors

Other Decks in Technology

Featured

Transcript