Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ops and Operability

Ops and Operability

Once Ada Lovelace invented programming Jane Austen knew it wasn't going to end well unless she invented Operations. She proposed DevOps in the early 19th century in a series of coded stories with titles like "Support and Supportability".

DevOps is a synthesis of agile development practises—small releases, high automation, close collaboration—with the "keeping the lights on" rigour and discipline of the Operations Centre. For it to succeed we need to learn to treat Ops as equals to Dev rather than voiceless downstream consumers.

Daniel Terhorst-North
PRO

February 25, 2016
Tweet

More Decks by Daniel Terhorst-North

Other Decks in Technology

Transcript

  1. OPS AND OPERABILITY

    View Slide

  2. @tastapod
    Dan North

    View Slide

  3. @tastapod
    Imagine a company that treated its
    customers like this…
    No consistency in its products…
    …or even within a product
    No clues when something breaks…
    …or even that something has broken!
    No warning that things are about to break
    Blaming the customer when things go wrong
    “Provided without warranty”

    View Slide

  4. @tastapod
    How can
    I fix it?
    “Provided without warranty”
    What
    happened?
    Can I fix it?
    How long will
    it be broken?
    Who can I
    escalate this to?
    Am I just stuck
    with this?
    Does anyone care?

    View Slide

  5. @tastapod
    This is how Dev treats Ops
    DevOps starts with Dev pushing on Ops
    …efforts to compromise
    their governance,
    assurance, audit,
    compliance, control
    processes and structures
    Ops resists…

    View Slide

  6. @tastapod
    Ops is still on the hook for…
    Runtime operations
    SLAs
    Diagnosis
    Recovery
    Restoration
    Business continuity

    View Slide

  7. AUTOMATION & AUTONOMY

    View Slide

  8. @tastapod
    The downstream view of “autonomy”
    Let’s just push this
    to production

    View Slide

  9. @tastapod
    The downstream view of “autonomy”

    View Slide

  10. @tastapod
    The downstream view of “autonomy”

    View Slide

  11. @tastapod
    The downstream view of “autonomy”

    View Slide

  12. @tastapod
    Autonomy needs accountability
    How to resolve local autonomy and global consistency?
    “The Spotify problem”

    View Slide

  13. @tastapod
    Contextual Consistency — a pattern
    “Given the same context, and the same constraints,

    we are likely to make similar decisions”
    or


    “What’s the smallest amount of advice you can give me

    so I’m unlikely to screw this up?”

    View Slide

  14. SUPPORT & SUPPORTABILITY

    View Slide

  15. @tastapod
    Meet the Ops Team
    You build it,
    you run it!
    They have no
    understanding of
    monitoring
    Developers
    should be on the
    support rota*
    *This isn’t always possible

    View Slide

  16. @tastapod
    How supportable is your application?
    Three magic questions for incident management:
    1. What happened?
    2. Who is impacted?
    3. How do we fix it?
    The real question:
    - How could we reduce the impact of this?
    MTTR trumps MTBF
    Imagine being paged at 4am for your error message

    View Slide

  17. @tastapod
    Captain’s Log — a pattern
    “Don’t tell me, let me figure it out”
    A log message should contain:
    - a timestamp, for humans and machines
    - a unique correlation ID, “edge-to-edge”
    - the cause, the whole cause, and nothing but the cause
    - answers to the three questions, or at least pointers
    A log is an append-only, read-only, user interface!

    View Slide

  18. PACKAGING

    View Slide

  19. It is a truth universally
    acknowledged, that a
    developer in possession
    of a build must be in
    want of a server

    View Slide

  20. @tastapod
    Automating deployment is one thing.
    Understanding the release process is another.
    Having something worth deploying is something else again!

    View Slide

  21. @tastapod
    Phone Home — a pattern
    Every component should heartbeat
    There are lots of options for this:
    - Broadcasting a UDP packet
    - Writing to a service registry
    - Sending a message
    A single packet can carry 1500 bytes
    - That’s a lot of information!
    {
    "name" : "product_search",
    "app" : "online_shop",
    "requires": ["other", "components"],
    "address": {
    "host": "10.0.0.135",
    "port": "1337"
    },
    "heartbeat": {
    "interval" : 500,
    "mia_interval": 5000
    },
    "config": {
    "git_revision" : "3ef82c",
    "deployed_from": "Dan's laptop",
    "deployed_by" : "Dan North",
    "deployed_on" : "2016-01-15 13:22:00"
    },
    "status": {
    "memory" : 80,
    "cpu_load": [4.92, 2.94, 2.14],
    "io_load" : 45,
    "disk" : 72
    },
    "rel": {
    "config": "/config",
    "status": "/status"
    }
    }

    View Slide

  22. OPS & OPERABILITY

    View Slide

  23. USE & USABILITY

    View Slide

  24. —Bill Buxton
    “User experience is the experience a user has”

    View Slide

  25. @tastapod
    Developers:
    What does it feel like to build your software?
    What does it feel like to deploy your software?
    What does it feel like to test your software?
    What does it feel like to release your software?
    What does it feel like to monitor your software?
    What does it feel like to support your software?

    View Slide

  26. @tastapod
    Ops engineers:
    How can you help the developers help you?
    How can you help them help themselves?
    How can you “get out of their way”?
    Where should you start?

    View Slide

  27. @tastapod
    Happy Ops!
    Developers
    study how we
    work!
    They
    look beyond their
    own apps
    They are
    Devs thinking

    like Ops
    They learn
    about release
    engineering!
    They learn
    about security!
    There
    should be a word
    for that…

    View Slide

  28. THE END

    View Slide