Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to Build a Culture of Operational Excellence

How to Build a Culture of Operational Excellence

This was a talk that I gave at AgileConf 2014. The talk focuses on steps you can take to get people to care about operating their own software.

Arup Chakrabarti

July 31, 2014
Tweet

More Decks by Arup Chakrabarti

Other Decks in Technology

Transcript

  1. 10/23/14 About Arup HOW TO BUILD A CULTURE OF OPERATIONAL

    EXCELLENCE OR WHY YOU SHOULD LISTEN TO ME
  2. 10/23/14 Another Disclaimer HOW TO BUILD A CULTURE OF OPERATIONAL

    EXCELLENCE THE TITLE OF THIS TALK IS NOT CORRECT Operational Excellence == Engineers Caring == Working Software from Start to Finish
  3. 10/23/14 Defining Culture HOW TO BUILD A CULTURE OF GIVING

    A $*IT FOR THIS PRESENTATION • Explicit (or Implicit) set of shared values • Not always what is written down • Alignment across individuals • Hard to change • Manifests differently for: • Entire Company • Departments • Teams
  4. 10/23/14 Why do Great Cultures Matter? HOW TO BUILD A

    CULTURE OF GIVING A $*IT CONSISTENTLY A PREDICTOR • Faster decision making • Alignment across teams • Successful companies have great cultures
  5. 10/23/14 Examples of Culture HOW TO BUILD A CULTURE OF

    GIVING A $*IT SOME HIGHLIGHTS • Beth Israel Deaconess Medical Center, Boston MA • One Core Value: Community
  6. 10/23/14 Examples of Culture HOW TO BUILD A CULTURE OF

    GIVING A $*IT SOME HIGHLIGHTS • Amazon • One Core Value: Customer Obsession
  7. 10/23/14 Examples of Culture HOW TO BUILD A CULTURE OF

    GIVING A $*IT SOME HIGHLIGHTS • Netflix • Freedom and Responsibility
  8. 10/23/14 The Opposite Happens HOW TO BUILD A CULTURE OF

    GIVING A $*IT NOT ALL ROSES, HARD TO CHANGE
  9. 10/23/14 Developer vs. Operator Cultures HOW TO BUILD A CULTURE

    OF GIVING A $*IT HOW DID THIS BECOME A PROBLEM? ! • Different Values • How did this happen?
  10. 10/23/14 Mainframe Operator HOW TO BUILD A CULTURE OF GIVING

    A $*IT FIRST DESIGNATION BETWEEN OPERATOR AND DEVELOPER
  11. 10/23/14 Distributed Systems Theory Happens HOW TO BUILD A CULTURE

    OF GIVING A $*IT CAN NO LONGER MANAGE INDIVIDUAL MACHINES
  12. 10/23/14 Developers can Operate, Operators can Develop HOW TO BUILD

    A CULTURE OF GIVING A $*IT • Problem • Developers care about Change • Operators care about Stability
  13. 10/23/14 The Problem HOW TO BUILD A CULTURE OF GIVING

    A $*IT ANOTHER WAY TO THINK ABOUT THIS • The Magic Formula • Change ~ Downtime • More Money Change = More Problems
  14. 10/23/14 The Problem Revised HOW TO BUILD A CULTURE OF

    GIVING A $*IT ANOTHER WAY TO THINK ABOUT THIS • The Magic Formula Revised • Change ~ Innovation ~ Downtime
  15. 10/23/14 Why is this scary? HOW TO BUILD A CULTURE

    OF GIVING A $*IT • The Magic Formula Revised • Change ~ Innovation ~ Downtime • To maintain Stability is to not Innovate • To maintain Innovation is to reduce Stability
  16. 10/23/14 How this manifests culturally HOW TO BUILD A CULTURE

    OF GIVING A $*IT WHY IT BECOMES OPS VS. DEV • Misalignment • People blame each other • Despite better understanding
  17. 10/23/14 Scrappy Startup vs. BigCo HOW TO BUILD A CULTURE

    OF GIVING A $*IT Startup BigCo Customers Low High Risk with Change Low High ANOTHER PIECE OF THE CULTURE PUZZLE
  18. 10/23/14 Telecoms HOW TO BUILD A CULTURE OF GIVING A

    $*IT REALLY HARD TO INNOVATE • Critical Service • Lots of Customers • Regulations
  19. 10/23/14 Convince your Leadership HOW TO BUILD A CULTURE OF

    GIVING A $*IT YOU NEED TO GET BUY IN • Show impact on what they care about • Customer Trust • Actual financial loss
  20. 10/23/14 Get the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT NUMBERS NUMBERS NUMBERS • Start with something • Easily measurable • Ideally implicit • Manually is ok
  21. 10/23/14 Get the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT SOME EXAMPLES • Lagging Indicators • Outages • Support Tickets • Rollbacks
  22. 10/23/14 Get the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT SOME EXAMPLES • Leading Indicators • Code Coverage • Code Churn • Performance Tests • Deployment Times • This is advanced
  23. 10/23/14 Present the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT PRESENTATION IS IMPORTANT • Easily understood metrics • Order them by importance • No fancy stats
  24. 10/23/14 Quick Stats Primer HOW TO BUILD A CULTURE OF

    GIVING A $*IT AVERAGES ARE OK FOR NORMALLY DISTRIBUTED DATASETS 0 400 800 1200 1600 0 100 200 300 400 500 600 700 800 900 1000 AVERAGE LATENCY (MS) # REQUESTS
  25. 10/23/14 Quick Stats Primer HOW TO BUILD A CULTURE OF

    GIVING A $*IT AVERAGES DO NOT WORK ON NON-NORMAL DISTRIBUTIONS 0 250 500 750 1000 0 100 200 300 400 500 600 700 800 900 1000 # REQUESTS LATENCY (MS) AVERAGE
  26. 10/23/14 Quick Stats Primer HOW TO BUILD A CULTURE OF

    GIVING A $*IT AVERAGES DO NOT WORK ON NON-NORMAL DISTRIBUTIONS 0 250 500 750 1000 0 100 200 300 400 500 600 700 800 900 1000 # REQUESTS LATENCY (MS) 95 %
  27. 10/23/14 Present the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT SAMPLE REPORT METRIC/TIME JAN FEB MAR APR MAY JUN #OUTAGES 23 67 89 10 13 12 #BUGS 121 172 188 30 24 29 #TICKETS 543 768 929 183 123 371 %ERROR RATE 1 3 8 1 2 1 99th % LATENCY (ms) 200 230 300 350 300 250
  28. 10/23/14 Review the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT MAKE SURE EVERYONE SEES THE METRICS • Regular weekly reviews • “Why did that number go up?” • Daily progress emails • Eventually, set goals • Rally everyone around the metrics
  29. 10/23/14 Back to the Magic Formula HOW TO BUILD A

    CULTURE OF GIVING A $*IT • The Magic Formula Revised Again • Change ~ (k) Innovation ~ (h) Downtime • (k) - Increase k to amplify innovation per change • Test environments, A/B testing, orchestration, CI Servers, splitting up codebases • (h) - Decrease h to improve stability per change • Fast deploys, better alerting, splitting up codebases
  30. 10/23/14 Monitoring and Alerting HOW TO BUILD A CULTURE OF

    GIVING A $*IT PRODUCTION MONITORING AND ALERTING • Never let a customer tell you about the problem • Reinforce ownership via alerts • Everyone can see all metrics • Tools • StatsD, DataDog, SumoLogic, New Relic, etc
  31. 10/23/14 Incident Management HOW TO BUILD A CULTURE OF GIVING

    A $*IT MANAGE OUTAGES MORE EFFECTIVELY • Have a plan to get everyone together • Everyone needs to know what they own • Tools • PagerDuty, HipChat, Slack, IRC, Hangouts, Conference Bridges
  32. 10/23/14 Incident Management HOW TO BUILD A CULTURE OF GIVING

    A $*IT MANAGE OUTAGES MORE EFFECTIVELY
  33. 10/23/14 Incident Management HOW TO BUILD A CULTURE OF GIVING

    A $*IT MANAGE OUTAGES MORE EFFECTIVELY • Post Mortem • Timeline • Impact • Root Cause • Tools • PagerDuty, HipChat, Slack, IRC, Monitoring tools
  34. 10/23/14 Open Ticket/Bug Trackers HOW TO BUILD A CULTURE OF

    GIVING A $*IT BUG TRACKERS • Make it easy to file tickets/bugs • Make it easy to see tickets/bugs • Tools • Really? This is AgileConf! Goto the showroom!
  35. 10/23/14 Distributed Version Control HOW TO BUILD A CULTURE OF

    GIVING A $*IT CODE VISIBILITY • Let all engineers see all code • Security is the exception • Git Pull Request model • Tools • GitHub, BitBucket
  36. 10/23/14 Distributed Version Control HOW TO BUILD A CULTURE OF

    GIVING A $*IT CODE VISIBILITY JIRA ISSUE # TEAMMATE CODE SNIPPET BUILD STATUS
  37. 10/23/14 Continuous Integration HOW TO BUILD A CULTURE OF GIVING

    A $*IT AUTOMATE ALL THE THINGS • One Click Deploys • Zero Click Tests • Tools • Jenkins, Travis CI, ChatOps
  38. 10/23/14 ChatOps Example HOW TO BUILD A CULTURE OF GIVING

    A $*IT AUTOMATE ALL THE THINGS BUILD PASSES ENGINEER DEPLOYS BOT STARTS DEPLOY ENGINEER CLAIMS LOVE FOR BOT
  39. 10/23/14 Failure Injection HOW TO BUILD A CULTURE OF GIVING

    A $*IT BREAK THINGS ON PURPOSE • Failure Friday • Weekly Failure Testing • Get Ops and Dev into a room • Break Stuff • http://blog.pagerduty.com/2013/11/failure-friday- at-pagerduty/
  40. 10/23/14 Hiring is Hard HOW TO BUILD A CULTURE OF

    GIVING A $*IT VERY VERY VERY HARD RIGHT NOW • Be genuine • Past collaboration efforts • Customer focus • Want the same culture
  41. 10/23/14 Summary HOW TO BUILD A CULTURE OF GIVING A

    $*IT • Building Culture is hard • Problem is not new • Dev and Ops Worlds are colliding • People have different priorities • The Magic Formula • Building Culture takes time and effort • Hire the Right People