Slide 1

Slide 1 text

@mipsytipsy The Socio-Technical Path to ✨High-Performing✨ Teams Observability and the Glorious Future @mipsytipsy

Slide 2

Slide 2 text

@mipsytipsy engineer/cofounder/CTO https://charity.wtf

Slide 3

Slide 3 text

the fundamental building block by which we organize ourselves and coordinate and scale our labor. Teams.

Slide 4

Slide 4 text

engineer

Slide 5

Slide 5 text

The teams you join will define your career more than any other single factor.

Slide 6

Slide 6 text

bad jobs can be bad in so, so many different ways… • harmful product • glorified the results of
 poor planning • alienated from coworkers • long commute • indifferent manager • cargo-culted the worst of 
 Silicon Valley startup culture • aging, obsolete tech • high operational toil • fragile, flappy systems • complacency • low eng skill level • command-and-control 
 leadership

Slide 7

Slide 7 text

autonomy, learning, high- achieving, learned from our mistakes, curious, responsibility, ownership, inspiring, camaraderie, pride, collaboration, career growth, rewarding, motivating manual labor, sacred cows, wasted effort, stale tech, ass-covering, fear, fiefdoms, excessive toil, command-and-control, cargo culting, enervating, discouraging, lethargy, indifference

Slide 8

Slide 8 text

sociotechnical (n) “Technology is the sum of ways in which social groups construct the material objects of their civilizations. The things made are socially constructed just as much as technically constructed. The merging of these two things, construction and insight, is sociotechnology” — wikipedia if you change the tools people use, you can change how they behave and even who they are.

Slide 9

Slide 9 text

sociotechnical (n) Values Practices Tools

Slide 10

Slide 10 text

sociotechnical (n) Values Practices Tools

Slide 11

Slide 11 text

they perform A high-performing team isn’t just fun to be on. Nice coworkers who mean well and work/life balance are a good start, but

Slide 12

Slide 12 text

How well does YOUR team perform? https://services.google.com/fh/files/misc/state-of-devops-2019.pdf 4 key metrics.

Slide 13

Slide 13 text

1 — How frequently do you deploy? 2 — How long does it take for code to go live? 3 — How many of your deploys fail? 4 — How long does it take to recover from an outage? 5 — How often are you paged outside work hours?

Slide 14

Slide 14 text

There is a wide gap between elite teams and the bottom 50%.

Slide 15

Slide 15 text

Also, we waste a LOT of time. https://stripe.com/reports/developer-coefficient-2018 42%!!!

Slide 16

Slide 16 text

It really, really, really, really, really pays off to be on a high performing team. Like REALLY.

Slide 17

Slide 17 text

Q: What happens when an engineer from the elite yellow bubble joins a team in the blue bubble? A: Your productivity tends to rise (or fall) to the level of the team you join.

Slide 18

Slide 18 text

Who is going to be the better engineer two years from now? 3000 deploys/year 9 outages/year 6 hours firefighting 5 deploys/year 65 outages/year firefighting: constant

Slide 19

Slide 19 text

So how do we build high-performing teams? “Just hire the best engineers, and you’ll get the best team” Hire people who share your values and have the needed skills, and then the work of building a team can begin.

Slide 20

Slide 20 text

High-performing teams are continuously iterating towards production excellence. The work consists of cultivating sociotechnical feedback loops but it begins with observability. Happier customers, happier teams.

Slide 21

Slide 21 text

observability(n): “In control theory, observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. The observability and controllability of a system are mathematical duals." — wikipedia

Slide 22

Slide 22 text

Observability is not the same as monitoring. monitor your known-unknowns, instrument for observability into unknown-unknowns

Slide 23

Slide 23 text

Can you understand what’s happening inside your systems, just by asking questions from the outside? Can you debug your code and its behavior using its output? Can you answer new questions without shipping new code? o11y for software engineers:

Slide 24

Slide 24 text

You have an observable system when your team can quickly and reliably track down any new problem with no prior knowledge. For software engineers, this means being able to reason about your code, identify and fix bugs, and understand user experiences and behaviors ... via your instrumentation.

Slide 25

Slide 25 text

Observability requirements… https://www.honeycomb.io/blog/so-you-want-to-build-an-observability-tool/ • High cardinality • High dimensionality • Exploratory, open-ended investigation based on raw events • Service Level Objectives. No preaggregation. • Based on arbitrarily-wide structured events with span support • No indexes, schemas, or predefined structure • Bundling the full context of the request across network hops • Metrics != observability. Unstructured logs != observability.

Slide 26

Slide 26 text

1. Resiliency to failure 2. High-quality code 3. Manage complexity and technical debt 4. Predictable releases 5. Understand user behavior https://www.honeycomb.io/wp-content/uploads/2019/06/Framework-for-an-Observability-Maturity-Model.pdf Observability Maturity Model … find your weakest category, and tackle that first

Slide 27

Slide 27 text

1. Resiliency to failure 2. High-quality code 3. Manage complexity and technical debt 4. Predictable releases 5. Understand user behavior

Slide 28

Slide 28 text

1. Resiliency to failure 2. High-quality code 3. Manage complexity and technical debt 4. Predictable releases 5. Understand user behavior

Slide 29

Slide 29 text

1. Resiliency to failure 2. High-quality code 3. Manage complexity and technical debt 4. Predictable releases 5. Understand user behavior

Slide 30

Slide 30 text

1. Resiliency to failure 2. High-quality code 3. Manage complexity and technical debt 4. Predictable releases 5. Understand user behavior

Slide 31

Slide 31 text

1. Resiliency to failure 2. High-quality code 3. Manage complexity and technical debt 4. Predictable releases 5. Understand user behavior

Slide 32

Slide 32 text

Why are computers hard? Because we don't understand them And we keep shipping things anyway Our tools have rewarded guessing over debugging And vendors have happily misled you for $$$$ It’s time to fix this problem.

Slide 33

Slide 33 text

• Ephemeral and dynamic • Far-flung and loosely coupled • Partitioned, sharded • Distributed and replicated • Containers, schedulers • Service registries • Polyglot persistence strategies • Autoscaled, multiple failover • Emergent behaviors • ... etc Complexity is soaring

Slide 34

Slide 34 text

We don’t *know* what the questions are, all we have are unreliable symptoms or reports. Complexity is exploding everywhere, but our tools were designed for predictable worlds As soon as we know the question, we usually know the answer too.

Slide 35

Slide 35 text

We used to be able to reason about our architecture. Not anymore. 2003 2013 Now we have to instrument for observability. or we are screwed

Slide 36

Slide 36 text

Observability is the key to making the leap from known-unknowns to unknown- unknowns. unknown-unknowns known-unknowns monitoring observability

Slide 37

Slide 37 text

kick-start the virtuous cycle of you build it, you own it instrumenting two steps in front of you as you build never accept a PR unless you can explain it if it breaks watch your code go out as it deploys is it working as intended? does anything look weird look through the lens of your instrumentation

Slide 38

Slide 38 text

for extra fun … let’s examine the sociotechnical implications of the predominant architecture models of the past two decades: monoliths and microservices

Slide 39

Slide 39 text

Monolith • THE database • THE application • Known-unknowns and mostly predictable failures • Many monitoring checks/paging alerts • "Flip a switch" to deploy, changes are big bang and binary (all on/all off) • Failures to be prevented • Production is to be feared • Debug by intuition and scar tissue of past outages • Canned dashboards, runbooks, playbooks • Deploys are scary • Masochistic on-call culture sociotechnical causes & effects

Slide 40

Slide 40 text

Monolith • We built our systems like glass castles — a fragile, forbidding edifice that we could tightly control access to. • Very hostile to exploration or experimentation

Slide 41

Slide 41 text

• Many storage systems, many services, many polyglot technologies • Unknown-unknowns dominate • Every alert is a novel question • Rich, flexible instrumentation • Few paging alerts, tied to SLOs and keying off user pain • A deploy is just the beginning of gaining confidence in your code • Failures are your friend • Production is where your users live, you should be in there too, watching them every day • Debug methodically by examining the evidence and following the clues • Inspect the full context of the event • Deploys are opportunities • On-call must be sustainable, humane sociotechnical causes & effects Microservices

Slide 42

Slide 42 text

• Software ownership -- you build it, you run it • Robust, resilient, built for experimentation and testing in prod • Human scale, with guard rails for safety Microservices

Slide 43

Slide 43 text

Here's the dirty little secret. The next generation of systems won't be built and run by burned out, exhausted people, or command-and-control teams just following orders. It can't be done. they've become too complicated. too hard.

Slide 44

Slide 44 text

We can no longer fit these systems in our heads and reason about them -- if we try, we'll be outcompeted by teams who use proper tools. Our systems are emergent and unpredictable. We need more than just your logical brain; we need your full creative self.

Slide 45

Slide 45 text

"I don't have time to invest in observability right now. Maybe later” You can't afford not to.

Slide 46

Slide 46 text

where are we going?

Slide 47

Slide 47 text

on call will be shared by everyone who writes code. on call must be not-terrible. invest in your deploys, democratize production curate feedback loops (don’t be scared by regulations)

Slide 48

Slide 48 text

Your labor is a scarce and precious resource. Lend it to those who are worthy of it.

Slide 49

Slide 49 text

we have an opportunity here to make things better let's do it <3

Slide 50

Slide 50 text

Charity Majors @mipsytipsy