Save 37% off PRO during our Black Friday Sale! »

How to Build a Culture of Operational Excellence

How to Build a Culture of Operational Excellence

This was a talk that I gave at AgileConf 2014. The talk focuses on steps you can take to get people to care about operating their own software.

Ebe1d126c7c859171156efb4c08db14f?s=128

Arup Chakrabarti

July 31, 2014
Tweet

Transcript

  1. 10/23/14 @arupchak Arup Chakrabarti Operations Engineering PagerDuty How to Build

    a Culture of Operational Excellence
  2. 10/23/14 About Arup HOW TO BUILD A CULTURE OF OPERATIONAL

    EXCELLENCE OR WHY YOU SHOULD LISTEN TO ME
  3. 10/23/14 Disclaimer HOW TO BUILD A CULTURE OF OPERATIONAL EXCELLENCE

    I DID NOT COME UP WITH EVERYTHING
  4. 10/23/14 Another Disclaimer HOW TO BUILD A CULTURE OF OPERATIONAL

    EXCELLENCE THE TITLE OF THIS TALK IS NOT CORRECT Operational Excellence == Engineers Caring == Working Software from Start to Finish
  5. 10/23/14 @arupchak Arup Chakrabarti Operations Engineering PagerDuty How to Build

    a Culture of Giving a $*IT
  6. 10/23/14 Terminology: Culture HOW TO BUILD A CULTURE OF GIVING

    A $*IT WHAT IS IT GOOD FOR?
  7. 10/23/14 Defining Culture HOW TO BUILD A CULTURE OF GIVING

    A $*IT FOR THIS PRESENTATION • Explicit (or Implicit) set of shared values • Not always what is written down • Alignment across individuals • Hard to change • Manifests differently for: • Entire Company • Departments • Teams
  8. 10/23/14 Why do Great Cultures Matter? HOW TO BUILD A

    CULTURE OF GIVING A $*IT CONSISTENTLY A PREDICTOR • Faster decision making • Alignment across teams • Successful companies have great cultures
  9. 10/23/14 Examples of Culture HOW TO BUILD A CULTURE OF

    GIVING A $*IT SOME HIGHLIGHTS • Beth Israel Deaconess Medical Center, Boston MA • One Core Value: Community
  10. 10/23/14 Examples of Culture HOW TO BUILD A CULTURE OF

    GIVING A $*IT SOME HIGHLIGHTS • Amazon • One Core Value: Customer Obsession
  11. 10/23/14 Examples of Culture HOW TO BUILD A CULTURE OF

    GIVING A $*IT SOME HIGHLIGHTS • Netflix • Freedom and Responsibility
  12. 10/23/14 The Opposite Happens HOW TO BUILD A CULTURE OF

    GIVING A $*IT NOT ALL ROSES, HARD TO CHANGE
  13. 10/23/14 Developer vs. Operator Cultures HOW TO BUILD A CULTURE

    OF GIVING A $*IT HOW DID THIS BECOME A PROBLEM? ! • Different Values • How did this happen?
  14. 10/23/14 Mainframe Operator HOW TO BUILD A CULTURE OF GIVING

    A $*IT FIRST DESIGNATION BETWEEN OPERATOR AND DEVELOPER
  15. 10/23/14 Desktop Operator HOW TO BUILD A CULTURE OF GIVING

    A $*IT STILL NEED THE OPERATOR
  16. 10/23/14 Systems Operator HOW TO BUILD A CULTURE OF GIVING

    A $*IT STILL NEED THE OPERATOR
  17. 10/23/14 Infrastructure Automation Happens HOW TO BUILD A CULTURE OF

    GIVING A $*IT OPERATOR JOB CHANGES
  18. 10/23/14 Distributed Systems Theory Happens HOW TO BUILD A CULTURE

    OF GIVING A $*IT CAN NO LONGER MANAGE INDIVIDUAL MACHINES
  19. 10/23/14 Developers can Operate, Operators can Develop HOW TO BUILD

    A CULTURE OF GIVING A $*IT • Problem • Developers care about Change • Operators care about Stability
  20. 10/23/14 The Problem HOW TO BUILD A CULTURE OF GIVING

    A $*IT ANOTHER WAY TO THINK ABOUT THIS • The Magic Formula • Change ~ Downtime • More Money Change = More Problems
  21. 10/23/14 The Problem Revised HOW TO BUILD A CULTURE OF

    GIVING A $*IT ANOTHER WAY TO THINK ABOUT THIS • The Magic Formula Revised • Change ~ Innovation ~ Downtime
  22. 10/23/14 Why is this scary? HOW TO BUILD A CULTURE

    OF GIVING A $*IT • The Magic Formula Revised • Change ~ Innovation ~ Downtime • To maintain Stability is to not Innovate • To maintain Innovation is to reduce Stability
  23. 10/23/14 How this manifests culturally HOW TO BUILD A CULTURE

    OF GIVING A $*IT WHY IT BECOMES OPS VS. DEV • Misalignment • People blame each other • Despite better understanding
  24. 10/23/14 Scrappy Startup vs. BigCo HOW TO BUILD A CULTURE

    OF GIVING A $*IT Startup BigCo Customers Low High Risk with Change Low High ANOTHER PIECE OF THE CULTURE PUZZLE
  25. 10/23/14 Facebook 2012 HOW TO BUILD A CULTURE OF GIVING

    A $*IT ONCE UPON A TIME
  26. 10/23/14 Facebook 2014 HOW TO BUILD A CULTURE OF GIVING

    A $*IT EVEN THEY HAD TO CHANGE
  27. 10/23/14 Telecoms HOW TO BUILD A CULTURE OF GIVING A

    $*IT REALLY HARD TO INNOVATE • Critical Service • Lots of Customers • Regulations
  28. 10/23/14 Are we screwed? HOW TO BUILD A CULTURE OF

    GIVING A $*IT
  29. 10/23/14 Are we screwed? ! No! HOW TO BUILD A

    CULTURE OF GIVING A $*IT
  30. 10/23/14 Building the Culture HOW TO BUILD A CULTURE OF

    GIVING A $*IT
  31. 10/23/14 Convince your Leadership HOW TO BUILD A CULTURE OF

    GIVING A $*IT YOU NEED TO GET BUY IN • Show impact on what they care about • Customer Trust • Actual financial loss
  32. 10/23/14 Get the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT NUMBERS NUMBERS NUMBERS • Start with something • Easily measurable • Ideally implicit • Manually is ok
  33. 10/23/14 Get the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT SOME EXAMPLES • Lagging Indicators • Outages • Support Tickets • Rollbacks
  34. 10/23/14 Get the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT SOME EXAMPLES • Leading Indicators • Code Coverage • Code Churn • Performance Tests • Deployment Times • This is advanced
  35. 10/23/14 Present the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT PRESENTATION IS IMPORTANT • Easily understood metrics • Order them by importance • No fancy stats
  36. 10/23/14 Quick Stats Primer HOW TO BUILD A CULTURE OF

    GIVING A $*IT AVERAGES ARE OK FOR NORMALLY DISTRIBUTED DATASETS 0 400 800 1200 1600 0 100 200 300 400 500 600 700 800 900 1000 AVERAGE LATENCY (MS) # REQUESTS
  37. 10/23/14 Quick Stats Primer HOW TO BUILD A CULTURE OF

    GIVING A $*IT AVERAGES DO NOT WORK ON NON-NORMAL DISTRIBUTIONS 0 250 500 750 1000 0 100 200 300 400 500 600 700 800 900 1000 # REQUESTS LATENCY (MS) AVERAGE
  38. 10/23/14 Quick Stats Primer HOW TO BUILD A CULTURE OF

    GIVING A $*IT AVERAGES DO NOT WORK ON NON-NORMAL DISTRIBUTIONS 0 250 500 750 1000 0 100 200 300 400 500 600 700 800 900 1000 # REQUESTS LATENCY (MS) 95 %
  39. 10/23/14 Present the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT SAMPLE REPORT METRIC/TIME JAN FEB MAR APR MAY JUN #OUTAGES 23 67 89 10 13 12 #BUGS 121 172 188 30 24 29 #TICKETS 543 768 929 183 123 371 %ERROR RATE 1 3 8 1 2 1 99th % LATENCY (ms) 200 230 300 350 300 250
  40. 10/23/14 Review the Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT MAKE SURE EVERYONE SEES THE METRICS • Regular weekly reviews • “Why did that number go up?” • Daily progress emails • Eventually, set goals • Rally everyone around the metrics
  41. 10/23/14 Back to the Magic Formula HOW TO BUILD A

    CULTURE OF GIVING A $*IT • The Magic Formula Revised Again • Change ~ (k) Innovation ~ (h) Downtime • (k) - Increase k to amplify innovation per change • Test environments, A/B testing, orchestration, CI Servers, splitting up codebases • (h) - Decrease h to improve stability per change • Fast deploys, better alerting, splitting up codebases
  42. 10/23/14 Influencing The Metrics HOW TO BUILD A CULTURE OF

    GIVING A $*IT
  43. 10/23/14 Monitoring and Alerting HOW TO BUILD A CULTURE OF

    GIVING A $*IT PRODUCTION MONITORING AND ALERTING • Never let a customer tell you about the problem • Reinforce ownership via alerts • Everyone can see all metrics • Tools • StatsD, DataDog, SumoLogic, New Relic, etc
  44. 10/23/14 Incident Management HOW TO BUILD A CULTURE OF GIVING

    A $*IT MANAGE OUTAGES MORE EFFECTIVELY • Have a plan to get everyone together • Everyone needs to know what they own • Tools • PagerDuty, HipChat, Slack, IRC, Hangouts, Conference Bridges
  45. 10/23/14 Incident Management HOW TO BUILD A CULTURE OF GIVING

    A $*IT MANAGE OUTAGES MORE EFFECTIVELY
  46. 10/23/14 Incident Management HOW TO BUILD A CULTURE OF GIVING

    A $*IT MANAGE OUTAGES MORE EFFECTIVELY • Post Mortem • Timeline • Impact • Root Cause • Tools • PagerDuty, HipChat, Slack, IRC, Monitoring tools
  47. 10/23/14 Open Ticket/Bug Trackers HOW TO BUILD A CULTURE OF

    GIVING A $*IT BUG TRACKERS • Make it easy to file tickets/bugs • Make it easy to see tickets/bugs • Tools • Really? This is AgileConf! Goto the showroom!
  48. 10/23/14 Distributed Version Control HOW TO BUILD A CULTURE OF

    GIVING A $*IT CODE VISIBILITY • Let all engineers see all code • Security is the exception • Git Pull Request model • Tools • GitHub, BitBucket
  49. 10/23/14 Distributed Version Control HOW TO BUILD A CULTURE OF

    GIVING A $*IT CODE VISIBILITY JIRA ISSUE # TEAMMATE CODE SNIPPET BUILD STATUS
  50. 10/23/14 Continuous Integration HOW TO BUILD A CULTURE OF GIVING

    A $*IT AUTOMATE ALL THE THINGS • One Click Deploys • Zero Click Tests • Tools • Jenkins, Travis CI, ChatOps
  51. 10/23/14 ChatOps Example HOW TO BUILD A CULTURE OF GIVING

    A $*IT AUTOMATE ALL THE THINGS BUILD PASSES ENGINEER DEPLOYS BOT STARTS DEPLOY ENGINEER CLAIMS LOVE FOR BOT
  52. 10/23/14 Failure Injection HOW TO BUILD A CULTURE OF GIVING

    A $*IT BREAK THINGS ON PURPOSE • Failure Friday • Weekly Failure Testing • Get Ops and Dev into a room • Break Stuff • http://blog.pagerduty.com/2013/11/failure-friday- at-pagerduty/
  53. 10/23/14 Failure Injection HOW TO BUILD A CULTURE OF GIVING

    A $*IT BREAK THINGS ON PURPOSE
  54. 10/23/14 Get The Right People HOW TO BUILD A CULTURE

    OF GIVING A $*IT
  55. 10/23/14 Hiring is Hard HOW TO BUILD A CULTURE OF

    GIVING A $*IT VERY VERY VERY HARD RIGHT NOW • Be genuine • Past collaboration efforts • Customer focus • Want the same culture
  56. 10/23/14 Get Rid of The Wrong People HOW TO BUILD

    A CULTURE OF GIVING A $*IT
  57. 10/23/14 Summary HOW TO BUILD A CULTURE OF GIVING A

    $*IT • Building Culture is hard • Problem is not new • Dev and Ops Worlds are colliding • People have different priorities • The Magic Formula • Building Culture takes time and effort • Hire the Right People
  58. 10/23/14 pagerduty.com/jobs Thank you. We are Hiring! @arupchak