Ops and Operability

OPS AND OPERABILITY

@tastapod Dan North

@tastapod Imagine a company that treated its customers like this…
No consistency in its products… …or even within a product No clues when something breaks… …or even that something has broken! No warning that things are about to break Blaming the customer when things go wrong “Provided without warranty”

@tastapod How can I ﬁx it? “Provided without warranty” What
happened? Can I ﬁx it? How long will it be broken? Who can I escalate this to? Am I just stuck with this? Does anyone care?

@tastapod This is how Dev treats Ops DevOps starts with
Dev pushing on Ops …efforts to compromise their governance, assurance, audit, compliance, control processes and structures Ops resists…

@tastapod Ops is still on the hook for… Runtime operations
SLAs Diagnosis Recovery Restoration Business continuity

AUTOMATION & AUTONOMY

@tastapod The downstream view of “autonomy” Let’s just push this
to production

@tastapod The downstream view of “autonomy”

@tastapod Autonomy needs accountability How to resolve local autonomy and
global consistency? “The Spotify problem”

@tastapod Contextual Consistency — a pattern “Given the same context,
and the same constraints,  we are likely to make similar decisions” or    “What’s the smallest amount of advice you can give me  so I’m unlikely to screw this up?”

SUPPORT & SUPPORTABILITY

@tastapod Meet the Ops Team You build it, you run
it! They have no understanding of monitoring Developers should be on the support rota* *This isn’t always possible

@tastapod How supportable is your application? Three magic questions for
incident management: 1. What happened? 2. Who is impacted? 3. How do we ﬁx it? The real question: - How could we reduce the impact of this? MTTR trumps MTBF Imagine being paged at 4am for your error message

@tastapod Captain’s Log — a pattern “Don’t tell me, let
me ﬁgure it out” A log message should contain: - a timestamp, for humans and machines - a unique correlation ID, “edge-to-edge” - the cause, the whole cause, and nothing but the cause - answers to the three questions, or at least pointers A log is an append-only, read-only, user interface!

PACKAGING

It is a truth universally acknowledged, that a developer in
possession of a build must be in want of a server

@tastapod Automating deployment is one thing. Understanding the release process
is another. Having something worth deploying is something else again!

@tastapod Phone Home — a pattern Every component should heartbeat
There are lots of options for this: - Broadcasting a UDP packet - Writing to a service registry - Sending a message A single packet can carry 1500 bytes - That’s a lot of information! { "name" : "product_search", "app" : "online_shop", "requires": ["other", "components"], "address": { "host": "10.0.0.135", "port": "1337" }, "heartbeat": { "interval" : 500, "mia_interval": 5000 }, "config": { "git_revision" : "3ef82c", "deployed_from": "Dan's laptop", "deployed_by" : "Dan North", "deployed_on" : "2016-01-15 13:22:00" }, "status": { "memory" : 80, "cpu_load": [4.92, 2.94, 2.14], "io_load" : 45, "disk" : 72 }, "rel": { "config": "/config", "status": "/status" } }

OPS & OPERABILITY

USE & USABILITY

—Bill Buxton “User experience is the experience a user has”

@tastapod Developers: What does it feel like to build your
software? What does it feel like to deploy your software? What does it feel like to test your software? What does it feel like to release your software? What does it feel like to monitor your software? What does it feel like to support your software?

@tastapod Ops engineers: How can you help the developers help
you? How can you help them help themselves? How can you “get out of their way”? Where should you start?

@tastapod Happy Ops! Developers study how we work! They look
beyond their own apps They are Devs thinking  like Ops They learn about release engineering! They learn about security! There should be a word for that…

THE END

Ops and Operability

Ops and Operability

Daniel Terhorst-North PRO

More Decks by Daniel Terhorst-North

Other Decks in Technology

Featured

Transcript