Observability-Driven Development: What DevOps is Really About

Observability-Driven Development What DevOps is Really About Add some sort
of nice picture…maybe one of the observation deck ones? Greg Shackles Principal Engineer, Olo @gshackles [email protected]

whoami Greg Shackles Principal Engineer, Olo Based in Los Angeles,
CA @gshackles [email protected] github.com/gshackles speakerdeck.com/gshackles

it all starts here

what does that mean? continuous integration postmortems agile infrastructure as
code continuous delivery continuous deployment autoscaling containers automation on-call rotations instrumentation chaos testing canary releases monitoring observability site reliability engineering feature flags continuous testing

are we doing DevOps now?

ok, so what even is DevOps? culture automation measurement sharing
DevOps is really about people

this is a team sport create a culture of sharing
shared ownership, understanding, and responsibility shipping is just the beginning product development requires continuously asking questions and iterating

measurement let’s talk about

what does your system look like?

what does it really look like?

control theory observability is a measure of how well internal
states of a system can be inferred by knowledge of its external outputs

why is this important? you can’t improve what you don’t
measure if you’re not measuring, you’re effectively flying blind level up your skills and your value this mindset will make you a better engineer and more valuable to your company observability is about asking questions it’s a cultural trait that drives how you build systems and products break down silos, share knowledge build a shared understanding and insight into your systems

test-driven development Red Green Refactor

testing in production reality check: you’re always testing in production,
whether you think you are or not embrace it

observability-driven development test in production keep observing measure in production,
validate assumptions consider adding instrumentation ahead of time to determine impact does anything now look abnormal? do you know what normal looks like?

observe in production release happened here

what should we measure?

handy methods R ate E rrors D uration U tilization
S aturation E rrors

it’s not just about technical metrics

SLIs, SLOs, SLAs… oh my! Service Level Indicator (SLI) defined
quantitative measure of some aspect of a service Service Level Objective (SLO) target value or range for a service, measured by an SLI Service Level Agreement (SLA) contract with your users that includes consequences of meeting or missing the SLOs landing.google.com/sre/book/chapters/service-level-objectives.html request latency, error rate, availability 99th percentile for request latency of 100ms or less public service: 10% discount for every hour below SLO, internal service: team gets paged

make it visible

create a culture of sharing

stigmergy the trace left by an action stimulates the next
action by the same or a different agent

stigmergic alerting

start small there isn’t a one-size-fits-all solution

error logging

structured logging serilog.net getseq.net

metrics

feature flags

feature flags launchdarkly.com

safely experiment in production github.com/github/Scientist.net

questions? Greg Shackles Principal Engineer, Olo @gshackles [email protected]

Observability-Driven Development: What DevOps i...

Observability-Driven Development: What DevOps is Really About

Greg Shackles

More Decks by Greg Shackles

Other Decks in Technology

Featured

Transcript