Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOps is People

Scott Robinson
December 01, 2012

DevOps is People

"Development" and "operations," as independent organizations, have conflicting motivations. DevOps is the buzzword for the understanding that we've missed the forest (providing value) for the trees (features and stability).

I'll share three true stories about three real organizations.

NB. if you work in an organisation with conflicts between development and operations, bring a story to share— and get free advice!

Scott Robinson

December 01, 2012
Tweet

More Decks by Scott Robinson

Other Decks in Business

Transcript

  1. Hi
    Who has deployed to production?

    View full-size slide

  2. DevOps is People
    Scott Robinson
    ThoughtWorks
    4 countries, 8 cities. People everywhere are the same.

    View full-size slide

  3. Individuals and
    interactions over
    processes and tools
    Who feels like a "Dev?" / like an "Op?"

    View full-size slide

  4. Features vs. Uptime
    Dev: gets paid for features / Op: paid for uptime — Whose position changed?

    View full-size slide

  5. I’ll achieve my KPIs by
    making you fail to meet
    yours.
    Relevant in services organizations where "Development" and "Operations" are different
    groups— lessons apply elsewhere! (Medical machines)

    View full-size slide

  6. Change is risk

    View full-size slide

  7. Why DevOps?
    Why now?

    View full-size slide

  8. The Cloud
    Amazon Web Services, Azure, VMWare, Heroku, GAE, iCloud, Salesforce

    View full-size slide

  9. Automation
    Chef, Puppet, Babushka, Juju, Jenkins, Travis, Go

    View full-size slide

  10. The bar has been raised
    •Getting features out faster
    •Faster recovery from failure

    View full-size slide

  11. It's Not a Secret

    View full-size slide

  12. Get Dev and Ops to
    work together.

    View full-size slide

  13. "DevOps" is not a role.
    At best: it's a initiative.

    View full-size slide

  14. Dramatic Structure
    Die Technik des Dramas
    Gustav Freytag (1863)
    Novelist and Playwright

    View full-size slide

  15. Freytag’s Pyramid
    • Exposition
    • Rising action
    • Climax
    • Falling action
    • Dénouement
    Think of your own story

    View full-size slide

  16. My first DevOps
    experience
    A "boutique" subsidiary of a major insurance company. Online sales portal gone PCI
    compliant using a suspicious third-party vendor ("Can you send us your API?")
    I suspected the vendor unreliable. (Test server off network. Same network as production.)
    Spoke with the vendor...

    View full-size slide

  17. That's only the test
    service. It's a single
    under-provisioned box.
    It’s not a problem.

    View full-size slide

  18. We haven't been told to
    monitor the external
    service.
    It’s not our problem.

    View full-size slide

  19. We don't need to monitor
    our vendor.
    We have SLAs for that.
    It’s not your problem.

    View full-size slide

  20. Beer + Credit Card +
    Pingdom = ?
    I have an equation...

    View full-size slide

  21. Metrics
    In old times, this have been the end of idle curiosity
    Little did our man in operations and our program manager know, but they'd just gone on
    pager duty.
    By the next Wednesday, there a meeting was held. The vendor, operations, development, and
    the business are all in the same room because we had a common goal.

    View full-size slide

  22. How do we stop losing
    sales?

    View full-size slide

  23. •Monitoring
    •Work when services fail
    •New SLAs
    Internal (nagios) and external (kingdom). Give Dev insight failures.
    Process the sale. Ops can manage a queue of e-mails saying the card didn't run.
    Give Dev and Ops a club to beat up our common enemy.

    View full-size slide


  24. By the next Friday, we all had a drink together. Money was being made.

    View full-size slide

  25. Stopping Manmade
    Disasters
    A government regulatory agency. Not a tech shop. Outsourced IT. Ops and Infra. ran by a
    third-party.

    View full-size slide

  26. Did you raise a ticket?
    One ops person on-site to put a human face on ...

    View full-size slide

  27. You build applications
    Working on LOB applications. Had provisioned servers, control over the deployment process,
    things were smooth. It can be nice to work with hands-off operations guys.
    But, changes were afoot. Infrastructure being virtualized as a cost-saving measure. File
    servers, databases, web servers, domain controllers, e-mail-- everything!

    View full-size slide

  28. Central Point of Failure
    One morning, everyone was standing, hands in their pockets, looking and gossiping like an
    accident had happened.
    CAS had gone down. Every single sign-on service— this included our applications— couldn't
    be accessed.
    When the server finally did come up, after a full day, a retrospective was held:

    View full-size slide

  29. The Retrospective
    • "What happened?" "We tried migrating
    the authorization server to a VM."
    • "What went wrong?" "We don't know."
    • "Will this happen again?" "We can't
    make any promises."
    • “What do we do if this happens again?”

    View full-size slide

  30. Dev helping Ops
    Someone else’s problem is your problem
    • Stub Authentication Server

    View full-size slide

  31. •Build levers and knobs for
    the applications
    •Only a button: on and off
    Built a stub server

    View full-size slide

  32. Audience Participation
    •Exposition
    •Rising action

    View full-size slide

  33. MEGA COR
    TELE COM
    Telecoms have thousands of IT systems and established (entrenched) cultures to manage
    them.

    View full-size slide

  34. Change Management

    View full-size slide

  35. Release Schedule
    Lodge change management request weeks before. This is because you're an agile team.
    Ops and stakeholders are notified. Timelines, resources and billing codes.

    View full-size slide

  36. 5:00 pm
    17:00
    Finally, the big day. Everyone says "see you later," because around 9pm everyone is back in
    the office.

    View full-size slide

  37. 9:00 pm
    21:00
    Speaker phone with voices. Laptops, IM, idle Facebook. "App is down." "App is back up...
    seems to be stalling. We'll try restarting the web server." More time passes. It's been 30
    minutes. "OK, app is up! Let's go!"
    Action! Windows and tabs on their machines. Features are validated. “What's the test
    username and password again?"

    View full-size slide

  38. 1:00 am
    1:00
    Shaking hands, borrowing cigarettes and catching cabs. You did it. This was the culmination
    of three months.

    View full-size slide

  39. The Next Morning
    Rollback
    You're the second one in-- your business analyst is already there because he had a meeting
    at 10. As you walk past him, he looks up with a dark look on his face. "We got rolled back this
    morning."

    View full-size slide

  40. Operations Has Done
    Their Job

    View full-size slide

  41. They don’t trust us.

    View full-size slide

  42. Let’s Earn Some Trust
    •Release regularly
    •Use feature toggles
    •Automated acceptance testing

    View full-size slide

  43. Release Regularly
    •Start monthly
    •Painful and honest
    conversation with our
    stakeholders
    •Stop Big Bang releases

    View full-size slide

  44. Use feature toggles
    •Completion of features has
    little relation to when
    they’re needed
    •Gets code out and live
    •These are levers and knobs

    View full-size slide

  45. “Sorry, we’re under
    maintenance.”
    Our first toggle

    View full-size slide

  46. •Only one person from dev
    needs to come in for the
    release.
    •Give confidence things work
    Automated Acceptance
    Testing

    View full-size slide

  47. Builds Trust
    •Save people’s time
    •Make solid releases
    Delight Ops
    Delight the Business

    View full-size slide

  48. This won’t get us daily
    deployments
    The next emergency does
    Always a next emergency. Out-of-schedule release. You knock it out of the park.
    Business thinks, "these guys are solid, and the feature I want out next week is scheduled for
    two weeks from now."
    Gave them all the ammunition they need; but no one to fight— Dev and Ops have been
    working together this whole time.

    View full-size slide

  49. DevOps is technology
    •Cloud Services
    •Software Testing
    •Continuous Delivery

    View full-size slide

  50. DevOps is people
    •Align motivations
    •Build trust
    •Succeed together

    View full-size slide