Slide 1

Slide 1 text

Observability-Driven Development What DevOps is Really About Add some sort of nice picture…maybe one of the observation deck ones? Greg Shackles Principal Engineer, Olo @gshackles [email protected]

Slide 2

Slide 2 text

whoami Greg Shackles Principal Engineer, Olo Based in Los Angeles, CA @gshackles [email protected] github.com/gshackles speakerdeck.com/gshackles

Slide 3

Slide 3 text

it all starts here

Slide 4

Slide 4 text

what does that mean? continuous integration postmortems agile infrastructure as code continuous delivery continuous deployment autoscaling containers automation on-call rotations instrumentation chaos testing canary releases monitoring observability site reliability engineering feature flags continuous testing

Slide 5

Slide 5 text

are we doing DevOps now?

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

ok, so what even is DevOps? culture automation measurement sharing DevOps is really about people

Slide 8

Slide 8 text

this is a team sport create a culture of sharing shared ownership, understanding, and responsibility shipping is just the beginning product development requires continuously asking questions and iterating

Slide 9

Slide 9 text

measurement let’s talk about

Slide 10

Slide 10 text

what does your system look like?

Slide 11

Slide 11 text

what does it really look like?

Slide 12

Slide 12 text

control theory observability is a measure of how well internal states of a system can be inferred by knowledge of its external outputs

Slide 13

Slide 13 text

why is this important? you can’t improve what you don’t measure if you’re not measuring, you’re effectively flying blind level up your skills and your value this mindset will make you a better engineer and more valuable to your company observability is about asking questions it’s a cultural trait that drives how you build systems and products break down silos, share knowledge build a shared understanding and insight into your systems

Slide 14

Slide 14 text

test-driven development Red Green Refactor

Slide 15

Slide 15 text

testing in production reality check: you’re always testing in production, whether you think you are or not embrace it

Slide 16

Slide 16 text

observability-driven development test in production keep observing measure in production, validate assumptions consider adding instrumentation ahead of time to determine impact does anything now look abnormal? do you know what normal looks like?

Slide 17

Slide 17 text

observe in production release happened here

Slide 18

Slide 18 text

what should we measure?

Slide 19

Slide 19 text

handy methods R ate E rrors D uration U tilization S aturation E rrors

Slide 20

Slide 20 text

it’s not just about technical metrics

Slide 21

Slide 21 text

SLIs, SLOs, SLAs… oh my! Service Level Indicator (SLI) defined quantitative measure of some aspect of a service Service Level Objective (SLO) target value or range for a service, measured by an SLI Service Level Agreement (SLA) contract with your users that includes consequences of meeting or missing the SLOs landing.google.com/sre/book/chapters/service-level-objectives.html request latency, error rate, availability 99th percentile for request latency of 100ms or less public service: 10% discount for every hour below SLO, internal service: team gets paged

Slide 22

Slide 22 text

make it visible

Slide 23

Slide 23 text

create a culture of sharing

Slide 24

Slide 24 text

stigmergy the trace left by an action stimulates the next action by the same or a different agent

Slide 25

Slide 25 text

stigmergic alerting

Slide 26

Slide 26 text

start small there isn’t a one-size-fits-all solution

Slide 27

Slide 27 text

error logging

Slide 28

Slide 28 text

structured logging serilog.net getseq.net

Slide 29

Slide 29 text

metrics

Slide 30

Slide 30 text

feature flags

Slide 31

Slide 31 text

feature flags launchdarkly.com

Slide 32

Slide 32 text

safely experiment in production github.com/github/Scientist.net

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

questions? Greg Shackles Principal Engineer, Olo @gshackles [email protected]