DevOps is People

1945ab4cdb87eaf5a5c906fa884c29f1?s=47 Scott Robinson
December 01, 2012

DevOps is People

"Development" and "operations," as independent organizations, have conflicting motivations. DevOps is the buzzword for the understanding that we've missed the forest (providing value) for the trees (features and stability).

I'll share three true stories about three real organizations.

NB. if you work in an organisation with conflicts between development and operations, bring a story to share— and get free advice!

1945ab4cdb87eaf5a5c906fa884c29f1?s=128

Scott Robinson

December 01, 2012
Tweet

Transcript

  1. Hi Who has deployed to production?

  2. DevOps is People Scott Robinson ThoughtWorks 4 countries, 8 cities.

    People everywhere are the same.
  3. Individuals and interactions over processes and tools Who feels like

    a "Dev?" / like an "Op?"
  4. Features vs. Uptime Dev: gets paid for features / Op:

    paid for uptime — Whose position changed?
  5. I’ll achieve my KPIs by making you fail to meet

    yours. Relevant in services organizations where "Development" and "Operations" are different groups— lessons apply elsewhere! (Medical machines)
  6. Change is risk

  7. Why DevOps? Why now?

  8. The Cloud Amazon Web Services, Azure, VMWare, Heroku, GAE, iCloud,

    Salesforce
  9. Automation Chef, Puppet, Babushka, Juju, Jenkins, Travis, Go

  10. The bar has been raised •Getting features out faster •Faster

    recovery from failure
  11. It's Not a Secret

  12. Get Dev and Ops to work together.

  13. "DevOps" is not a role. At best: it's a initiative.

  14. Dramatic Structure Die Technik des Dramas Gustav Freytag (1863) Novelist

    and Playwright
  15. Freytag’s Pyramid • Exposition • Rising action • Climax •

    Falling action • Dénouement Think of your own story
  16. My first DevOps experience A "boutique" subsidiary of a major

    insurance company. Online sales portal gone PCI compliant using a suspicious third-party vendor ("Can you send us your API?") I suspected the vendor unreliable. (Test server off network. Same network as production.) Spoke with the vendor...
  17. That's only the test service. It's a single under-provisioned box.

    It’s not a problem.
  18. We haven't been told to monitor the external service. It’s

    not our problem.
  19. We don't need to monitor our vendor. We have SLAs

    for that. It’s not your problem.
  20. Beer + Credit Card + Pingdom = ? I have

    an equation...
  21. Metrics In old times, this have been the end of

    idle curiosity Little did our man in operations and our program manager know, but they'd just gone on pager duty. By the next Wednesday, there a meeting was held. The vendor, operations, development, and the business are all in the same room because we had a common goal.
  22. How do we stop losing sales?

  23. •Monitoring •Work when services fail •New SLAs Internal (nagios) and

    external (kingdom). Give Dev insight failures. Process the sale. Ops can manage a queue of e-mails saying the card didn't run. Give Dev and Ops a club to beat up our common enemy.
  24. ☺ By the next Friday, we all had a drink

    together. Money was being made.
  25. Stopping Manmade Disasters A government regulatory agency. Not a tech

    shop. Outsourced IT. Ops and Infra. ran by a third-party.
  26. Did you raise a ticket? One ops person on-site to

    put a human face on ...
  27. You build applications Working on LOB applications. Had provisioned servers,

    control over the deployment process, things were smooth. It can be nice to work with hands-off operations guys. But, changes were afoot. Infrastructure being virtualized as a cost-saving measure. File servers, databases, web servers, domain controllers, e-mail-- everything!
  28. Central Point of Failure One morning, everyone was standing, hands

    in their pockets, looking and gossiping like an accident had happened. CAS had gone down. Every single sign-on service— this included our applications— couldn't be accessed. When the server finally did come up, after a full day, a retrospective was held:
  29. The Retrospective • "What happened?" "We tried migrating the authorization

    server to a VM." • "What went wrong?" "We don't know." • "Will this happen again?" "We can't make any promises." • “What do we do if this happens again?”
  30. Dev helping Ops Someone else’s problem is your problem •

    Stub Authentication Server
  31. •Build levers and knobs for the applications •Only a button:

    on and off Built a stub server
  32. Audience Participation •Exposition •Rising action

  33. MEGA COR TELE COM Telecoms have thousands of IT systems

    and established (entrenched) cultures to manage them.
  34. Change Management

  35. Release Schedule Lodge change management request weeks before. This is

    because you're an agile team. Ops and stakeholders are notified. Timelines, resources and billing codes.
  36. 5:00 pm 17:00 Finally, the big day. Everyone says "see

    you later," because around 9pm everyone is back in the office.
  37. 9:00 pm 21:00 Speaker phone with voices. Laptops, IM, idle

    Facebook. "App is down." "App is back up... seems to be stalling. We'll try restarting the web server." More time passes. It's been 30 minutes. "OK, app is up! Let's go!" Action! Windows and tabs on their machines. Features are validated. “What's the test username and password again?"
  38. 1:00 am 1:00 Shaking hands, borrowing cigarettes and catching cabs.

    You did it. This was the culmination of three months.
  39. The Next Morning Rollback You're the second one in-- your

    business analyst is already there because he had a meeting at 10. As you walk past him, he looks up with a dark look on his face. "We got rolled back this morning."
  40. Operations Has Done Their Job

  41. They don’t trust us.

  42. Let’s Earn Some Trust •Release regularly •Use feature toggles •Automated

    acceptance testing
  43. Release Regularly •Start monthly •Painful and honest conversation with our

    stakeholders •Stop Big Bang releases
  44. Use feature toggles •Completion of features has little relation to

    when they’re needed •Gets code out and live •These are levers and knobs
  45. “Sorry, we’re under maintenance.” Our first toggle

  46. •Only one person from dev needs to come in for

    the release. •Give confidence things work Automated Acceptance Testing
  47. Builds Trust •Save people’s time •Make solid releases Delight Ops

    Delight the Business
  48. This won’t get us daily deployments The next emergency does

    Always a next emergency. Out-of-schedule release. You knock it out of the park. Business thinks, "these guys are solid, and the feature I want out next week is scheduled for two weeks from now." Gave them all the ammunition they need; but no one to fight— Dev and Ops have been working together this whole time.
  49. DevOps is technology •Cloud Services •Software Testing •Continuous Delivery

  50. DevOps is people •Align motivations •Build trust •Succeed together

  51. fin