Slide 1

Slide 1 text

OPS AND OPERABILITY

Slide 2

Slide 2 text

@tastapod Dan North

Slide 3

Slide 3 text

@tastapod Imagine a company that treated its customers like this… No consistency in its products… …or even within a product No clues when something breaks… …or even that something has broken! No warning that things are about to break Blaming the customer when things go wrong “Provided without warranty”

Slide 4

Slide 4 text

@tastapod How can I fix it? “Provided without warranty” What happened? Can I fix it? How long will it be broken? Who can I escalate this to? Am I just stuck with this? Does anyone care?

Slide 5

Slide 5 text

@tastapod This is how Dev treats Ops DevOps starts with Dev pushing on Ops …efforts to compromise their governance, assurance, audit, compliance, control processes and structures Ops resists…

Slide 6

Slide 6 text

@tastapod Ops is still on the hook for… Runtime operations SLAs Diagnosis Recovery Restoration Business continuity

Slide 7

Slide 7 text

AUTOMATION & AUTONOMY

Slide 8

Slide 8 text

@tastapod The downstream view of “autonomy” Let’s just push this to production

Slide 9

Slide 9 text

@tastapod The downstream view of “autonomy”

Slide 10

Slide 10 text

@tastapod The downstream view of “autonomy”

Slide 11

Slide 11 text

@tastapod The downstream view of “autonomy”

Slide 12

Slide 12 text

@tastapod Autonomy needs accountability How to resolve local autonomy and global consistency? “The Spotify problem”

Slide 13

Slide 13 text

@tastapod Contextual Consistency — a pattern “Given the same context, and the same constraints,
 we are likely to make similar decisions” or
 
 “What’s the smallest amount of advice you can give me
 so I’m unlikely to screw this up?”

Slide 14

Slide 14 text

SUPPORT & SUPPORTABILITY

Slide 15

Slide 15 text

@tastapod Meet the Ops Team You build it, you run it! They have no understanding of monitoring Developers should be on the support rota* *This isn’t always possible

Slide 16

Slide 16 text

@tastapod How supportable is your application? Three magic questions for incident management: 1. What happened? 2. Who is impacted? 3. How do we fix it? The real question: - How could we reduce the impact of this? MTTR trumps MTBF Imagine being paged at 4am for your error message

Slide 17

Slide 17 text

@tastapod Captain’s Log — a pattern “Don’t tell me, let me figure it out” A log message should contain: - a timestamp, for humans and machines - a unique correlation ID, “edge-to-edge” - the cause, the whole cause, and nothing but the cause - answers to the three questions, or at least pointers A log is an append-only, read-only, user interface!

Slide 18

Slide 18 text

PACKAGING

Slide 19

Slide 19 text

It is a truth universally acknowledged, that a developer in possession of a build must be in want of a server

Slide 20

Slide 20 text

@tastapod Automating deployment is one thing. Understanding the release process is another. Having something worth deploying is something else again!

Slide 21

Slide 21 text

@tastapod Phone Home — a pattern Every component should heartbeat There are lots of options for this: - Broadcasting a UDP packet - Writing to a service registry - Sending a message A single packet can carry 1500 bytes - That’s a lot of information! { "name" : "product_search", "app" : "online_shop", "requires": ["other", "components"], "address": { "host": "10.0.0.135", "port": "1337" }, "heartbeat": { "interval" : 500, "mia_interval": 5000 }, "config": { "git_revision" : "3ef82c", "deployed_from": "Dan's laptop", "deployed_by" : "Dan North", "deployed_on" : "2016-01-15 13:22:00" }, "status": { "memory" : 80, "cpu_load": [4.92, 2.94, 2.14], "io_load" : 45, "disk" : 72 }, "rel": { "config": "/config", "status": "/status" } }

Slide 22

Slide 22 text

OPS & OPERABILITY

Slide 23

Slide 23 text

USE & USABILITY

Slide 24

Slide 24 text

—Bill Buxton “User experience is the experience a user has”

Slide 25

Slide 25 text

@tastapod Developers: What does it feel like to build your software? What does it feel like to deploy your software? What does it feel like to test your software? What does it feel like to release your software? What does it feel like to monitor your software? What does it feel like to support your software?

Slide 26

Slide 26 text

@tastapod Ops engineers: How can you help the developers help you? How can you help them help themselves? How can you “get out of their way”? Where should you start?

Slide 27

Slide 27 text

@tastapod Happy Ops! Developers study how we work! They look beyond their own apps They are Devs thinking
 like Ops They learn about release engineering! They learn about security! There should be a word for that…

Slide 28

Slide 28 text

THE END