Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Unrealized Role of Monitoring & Alerting ( All Day DevOps edition)

516fcd20ab7b946f50090ce1d557638c?s=47 j.hand
November 15, 2016

The Unrealized Role of Monitoring & Alerting ( All Day DevOps edition)

When prediction and prevention are focused on more than learning and innovation, organizations are not realizing the full value of monitoring and alerting.

516fcd20ab7b946f50090ce1d557638c?s=128

j.hand

November 15, 2016
Tweet

Transcript

  1. The Unrealized Role of: Monitoring & Alerting @jasonhand | VictorOps

    | #AllDayDevOps
  2. @jasonhand | VictorOps | #AllDayDevOps

  3. @jasonhand | VictorOps | #AllDayDevOps

  4. THE UNREALIZED ROLE OF: Monitoring & Alerting @jasonhand | VictorOps

    | #AllDayDevOps
  5. JASON HAND DevOps Evangelist VictorOps @jasonhand | VictorOps | #AllDayDevOps

  6. @jasonhand | VictorOps | #AllDayDevOps

  7. 2015 MONITORING SURVEY @jasonhand | VictorOps | #AllDayDevOps

  8. @jasonhand | VictorOps | #AllDayDevOps

  9. WHY ARE YOU COLLECTING THIS DATA? NOTE: You may choose

    more than one ▸ Performance analysis and trending ▸ Fault and Anomaly detection ▸ Capacity Planning ▸ A/B Testing @jasonhand | VictorOps | #AllDayDevOps
  10. THE RESULTS NOTE: Respondents may have chose more than one

    ▸ Performance analysis and trending - 63% ▸ Fault and Anomaly detection - 53% ▸ Capacity Planning - 45% ▸ A/B Testing - 11% @jasonhand | VictorOps | #AllDayDevOps
  11. Tyranny of the S.L.A. (Service Level Agreement) @jasonhand | VictorOps

    | #AllDayDevOps
  12. HIGH AVAILABILITY Prediction & Prevention @jasonhand | VictorOps | #AllDayDevOps

  13. @jasonhand | VictorOps | #AllDayDevOps

  14. @jasonhand | VictorOps | #AllDayDevOps

  15. THAT'S IMPORTANT ... BUT ... @jasonhand | VictorOps | #AllDayDevOps

  16. @jasonhand | VictorOps | #AllDayDevOps

  17. BUSINESS OBJECTIVES? @jasonhand | VictorOps | #AllDayDevOps

  18. HAPPY CAMPER @jasonhand | VictorOps | #AllDayDevOps

  19. CUSTOMERS want more than just 99.999% UPTIME @jasonhand | VictorOps

    | #AllDayDevOps
  20. @jasonhand | VictorOps | #AllDayDevOps

  21. WHERE'S THE INNOVATION? @jasonhand | VictorOps | #AllDayDevOps

  22. HOW IMPORTANT IS Learning & Innovation? @jasonhand | VictorOps |

    #AllDayDevOps
  23. @jasonhand | VictorOps | #AllDayDevOps

  24. @jasonhand | VictorOps | #AllDayDevOps

  25. @jasonhand | VictorOps | #AllDayDevOps

  26. @jasonhand | VictorOps | #AllDayDevOps

  27. The result of underutilizing monitoring & alerting is that the

    IT department and the organization have no chance to... LEARN, IMPROVE, OR INNOVATE. @jasonhand | VictorOps | #AllDayDevOps
  28. CONTINUALLY UNDERSTANDING & RESPONDING TO THE FEEDBACK from monitoring, logging,

    & alerting allows you to use information about events in the past to drive future actions. @jasonhand | VictorOps | #AllDayDevOps
  29. @jasonhand | VictorOps | #AllDayDevOps

  30. @jasonhand | VictorOps | #AllDayDevOps

  31. It's not just about PREDICTION & PREVENTION @jasonhand | VictorOps

    | #AllDayDevOps
  32. RESPOND & REPAIR ...QUICKLY @jasonhand | VictorOps | #AllDayDevOps

  33. NOPE @jasonhand | VictorOps | #AllDayDevOps

  34. MTTR Rather Than MTBF @jasonhand | VictorOps | #AllDayDevOps

  35. FAILURE IS INEVITABLE @jasonhand | VictorOps | #AllDayDevOps

  36. US·ER /ˈYOOZƏR/ DISTRIBUTED FAULT INJECTION TEST SUITE FOR PRODUCTION. credit:

    Leon Fayer (@papa_fire) @jasonhand | VictorOps | #AllDayDevOps
  37. SUCCESS is a result of FAILURE @jasonhand | VictorOps |

    #AllDayDevOps
  38. UNDERSTAND LEARN INNOVATE @jasonhand | VictorOps | #AllDayDevOps

  39. RE·SIL·IENT /RƏˈZILYƏNT/ The ability to resist, absorb, recover from or

    successfully adapt to adversity or a change in conditions @jasonhand | VictorOps | #AllDayDevOps
  40. CHANGE can cause failure but innovation requires CHANGE @jasonhand |

    VictorOps | #AllDayDevOps
  41. CONFLICT @jasonhand | VictorOps | #AllDayDevOps

  42. CHANGE REQUIRED @jasonhand | VictorOps | #AllDayDevOps

  43. Without deviation from the norm, progress is not possible —

    Frank Zappa @jasonhand | VictorOps | #AllDayDevOps
  44. What Did You LEARN From the Recovery Efforts? (including monitoring

    & alerting) @jasonhand | VictorOps | #AllDayDevOps
  45. POSTMORTEMS / LEARNING REVIEWS: Stories of: WHAT TOOK PLACE leading

    up to & during the disruption & recovery efforts @jasonhand | VictorOps | #AllDayDevOps
  46. WHO WAS INVOLVED? @jasonhand | VictorOps | #AllDayDevOps

  47. WHAT DID THEY SEE? @jasonhand | VictorOps | #AllDayDevOps

  48. WHAT WAS SAID? @jasonhand | VictorOps | #AllDayDevOps

  49. WHAT ACTIONS WERE TAKEN? jhand.co/chatopsbook @jasonhand | VictorOps | #AllDayDevOps

  50. HOW DO events & actions CORRELATE OVER TIME? @jasonhand |

    VictorOps | #AllDayDevOps
  51. 5 Why's @jasonhand | VictorOps | #AllDayDevOps

  52. 5 Why's @jasonhand | VictorOps | #AllDayDevOps

  53. WHAT IS THE "cause" OF THE PROBLEM? Root Cause is

    ... @jasonhand | VictorOps | #AllDayDevOps
  54. OUR ... obsession with "Root Cause" @jasonhand | VictorOps |

    #AllDayDevOps
  55. ASKING "WHY" .. leads to .. BLAME @jasonhand | VictorOps

    | #AllDayDevOps
  56. BLAMING LEADS TO.. operators hiding relevant & important information @jasonhand

    | VictorOps | #AllDayDevOps
  57. We must BELIEVE that our operators are doing their best

    given the constraints of the "system" @jasonhand | VictorOps | #AllDayDevOps
  58. "We are here to" LEARN From Failure (and success) @jasonhand

    | VictorOps | #AllDayDevOps
  59. RATHER THAN .. @jasonhand | VictorOps | #AllDayDevOps

  60. AVOID FAILURE @jasonhand | VictorOps | #AllDayDevOps

  61. WHAT'S THE STORY? @jasonhand | VictorOps | #AllDayDevOps

  62. INNOVATE Learning from both success & failure to develop &

    implement small incremental improvements is critical. @jasonhand | VictorOps | #AllDayDevOps
  63. MONITORING & ALERTING Helps us understand the story in greater

    detail @jasonhand | VictorOps | #AllDayDevOps
  64. LEARNING ORGANIZATION @jasonhand | VictorOps | #AllDayDevOps

  65. Learning does NOT come from READING & LISTENING @jasonhand |

    VictorOps | #AllDayDevOps
  66. Learning comes from DOING @jasonhand | VictorOps | #AllDayDevOps

  67. Real Learning comes from: OBSERVING ORIENTING DECIDING ACTING John Boyd's

    OODA Loop @jasonhand | VictorOps | #AllDayDevOps
  68. Example: LEARNING TO PLAY THE DOBRO GUITAR @jasonhand | VictorOps

    | #AllDayDevOps
  69. @jasonhand | VictorOps | #AllDayDevOps

  70. LEARNING @jasonhand | VictorOps | #AllDayDevOps

  71. WHY? Go from knowing... to understanding... to learning NOTE: (Requires

    making mistakes) @jasonhand | VictorOps | #AllDayDevOps
  72. We will trade some uptime in exchange for innovation -Dave

    Hahn (Netflix) DevOpsDays Boise 2016 @jasonhand | VictorOps | #AllDayDevOps
  73. SHIFT OUR GAZE from: MAINTAINING & PROTECTING @jasonhand | VictorOps

    | #AllDayDevOps
  74. LEARNING Which leads to... IMPROVING & INNOVATING @jasonhand | VictorOps

    | #AllDayDevOps
  75. WE INCREASE VALUE OF: - Monitoring & Alerting - IT

    teams - Products & Services - Organization @jasonhand | VictorOps | #AllDayDevOps
  76. HYPOTHESIZE EXPLORE STRETCH EXPERIMENT FAIL LEARN Try Again @jasonhand |

    VictorOps | #AllDayDevOps
  77. @jasonhand | VictorOps | #AllDayDevOps

  78. LEARNING & INNOVATING leads to uncovering new ways of BUILDING,

    DEPLOYING, AND MAINTAINING SOFTWARE & INFRASTRUCTURE Which leads to... @jasonhand | VictorOps | #AllDayDevOps
  79. RESILIENT SYSTEMS @jasonhand | VictorOps | #AllDayDevOps

  80. The By-product of a highly RESILIENT system is ... @jasonhand

    | VictorOps | #AllDayDevOps
  81. @jasonhand | VictorOps | #AllDayDevOps

  82. HIGHLY AVAILABLE SYSTEM @jasonhand | VictorOps | #AllDayDevOps

  83. THE UNREALIZED ROLE OF: Monitoring & Alerting is .... @jasonhand

    | VictorOps | #AllDayDevOps
  84. LEARNING & INNOVATION @jasonhand | VictorOps | #AllDayDevOps

  85. THANK YOU Be Victorious! @jasonhand | VictorOps | #AllDayDevOps

  86. @jasonhand | VictorOps | #AllDayDevOps

  87. Monitoring Survey: https://kartar.net/2015/08/monitoring- survey-2015---metrics/ Firefighter: https://www.learyfirefighters.org/wp-content/uploads/ 2013/09/cover-slide-1.jpg Mechanic: https://upload.wikimedia.org/wikipedia/commons/4/4b/ Flickr_-_Israel_Defense_Forces_-

    _Airplane_Technician,_March_2010.jpg Gnome Plan: http://www.nerdfitness.com/wp-content/uploads/ 2012/04/Screen-Shot-2012-03-30-at-3.15.38-AM-1024x7591.jpg NOC: https://upload.wikimedia.org/wikipedia/commons/0/03/ @jasonhand | VictorOps | #AllDayDevOps
  88. References: Kodak: http://file.answcdn.com/answ-cld/image/upload/v1/tk/ brand_image/b59911fc/ 91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpeg VW Camper: https://upload.wikimedia.org/wikipedia/commons/d/d7/ VW_Camper.jpg Blockbuster:

    https://jordanandeddie.files.wordpress.com/2013/11/ blockbuster-feature.jpg Borders: http://smashingtops.com/wp-content/uploads/2012/06/ borders_logo1.jpg @jasonhand | VictorOps | #AllDayDevOps
  89. Chained Hands: https://www.google.com/url? sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwjgrNCD h5TMAhXJs4MKHaoZDssQjBwIBA&url=http%3A%2F %2Fwww.publicdomainpictures.net%2Fdownload-picture.php %3Fadresar%3D50000%26soubor%3Dhands-in-chains.jpg%26id %3D40426&bvm=bv.119745492,d.amc&psig=AFQjCNFIdnDPzSqiLA- znIW5SCTCUHhqEw&ust=1460926880336203 Inevitable:

    http://vignette4.wikia.nocookie.net/matrix/images/5/51/ SMITH.png/revision/latest?cb=20110214092002 Bulb: https://smhttp-ssl-37293.nexcesscdn.net/media/catalog/ @jasonhand | VictorOps | #AllDayDevOps
  90. scoreboard/1000/Safety-Awareness-Sign-DSE-195271000.gif Stewie: http://chroniclesofredmark.com/wp-content/uploads/2014/01/ Stewie.gif change: http://i.imgur.com/EQyC6N3.gif Hard drive: https://i.imgur.com/pWsKSEf.gif Change:

    https://farm6.staticflickr.com/ 5208/5270199049df99b234e9od.jpg Value: https://d13yacurqjgara.cloudfront.net/users/6437/ screenshots/1405551/value-cropped.gif @jasonhand | VictorOps | #AllDayDevOps