Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Unrealized Role of Monitoring & Alerting

516fcd20ab7b946f50090ce1d557638c?s=47 j.hand
April 27, 2016

The Unrealized Role of Monitoring & Alerting

There's more value to be had from our monitoring, logging, and alerting tools. When we focus only on maintaining and firefighting .. and don't give any consideration to learning and innovating, we aren't capturing the full value of those tools, services, team, or company.

516fcd20ab7b946f50090ce1d557638c?s=128

j.hand

April 27, 2016
Tweet

Transcript

  1. The Unrealized Role of: Monitoring & Alerting @jasonhand | VictorOps

    | #DevOpsDays
  2. Jason Hand DevOps Evangelist VictorOps @jasonhand | VictorOps | #DevOpsDays

  3. SCaLE 14x Southern California Linux Expo @jasonhand | VictorOps |

    #DevOpsDays
  4. @jasonhand | VictorOps | #DevOpsDays

  5. 2015 Monitoring Survey @jasonhand | VictorOps | #DevOpsDays

  6. @jasonhand | VictorOps | #DevOpsDays

  7. Why Are You Collecting This Data? NOTE: You may choose

    more than one » Performance analysis and trending » Fault and Anomaly detection » Capacity Planning » A/B Testing » We don’t do anything with collected metrics @jasonhand | VictorOps | #DevOpsDays
  8. The Results NOTE: Respondents may have chose more than one

    » Performance analysis and trending - 63% » Fault and Anomaly detection - 53% » Capacity Planning - 45% » A/B Testing - 11% » We don’t do anything with collected metrics - 3% @jasonhand | VictorOps | #DevOpsDays
  9. Tyranny of the S.L.A. (Service Level Agreement) @jasonhand | VictorOps

    | #DevOpsDays
  10. High Availability Prediction & Prevention @jasonhand | VictorOps | #DevOpsDays

  11. @jasonhand | VictorOps | #DevOpsDays

  12. @jasonhand | VictorOps | #DevOpsDays

  13. That's Important ... but ... @jasonhand | VictorOps | #DevOpsDays

  14. @jasonhand | VictorOps | #DevOpsDays

  15. @jasonhand | VictorOps | #DevOpsDays

  16. Business Objectives? @jasonhand | VictorOps | #DevOpsDays

  17. Happy Camper @jasonhand | VictorOps | #DevOpsDays

  18. Customers want more than just 99.999% Uptime @jasonhand | VictorOps

    | #DevOpsDays
  19. @jasonhand | VictorOps | #DevOpsDays

  20. Where's the Innovation? @jasonhand | VictorOps | #DevOpsDays

  21. ? = @jasonhand | VictorOps | #DevOpsDays

  22. ? = Continuous Improvement @jasonhand | VictorOps | #DevOpsDays

  23. How Important is Learning & Innovation? @jasonhand | VictorOps |

    #DevOpsDays
  24. @jasonhand | VictorOps | #DevOpsDays

  25. @jasonhand | VictorOps | #DevOpsDays

  26. @jasonhand | VictorOps | #DevOpsDays

  27. @jasonhand | VictorOps | #DevOpsDays

  28. The result of underutilizing monitoring & alerting is that the

    IT department and the organization have no chance to... learn, improve, or innovate. @jasonhand | VictorOps | #DevOpsDays
  29. Continually understanding & responding to the feedback from monitoring, logging,

    & alerting allows you to use information about events in the past to drive future actions. @jasonhand | VictorOps | #DevOpsDays
  30. Switching Gears @jasonhand | VictorOps | #DevOpsDays

  31. @jasonhand | VictorOps | #DevOpsDays

  32. @jasonhand | VictorOps | #DevOpsDays

  33. It's not just about Prediction & Prevention @jasonhand | VictorOps

    | #DevOpsDays
  34. Respond & Repair ...Quickly @jasonhand | VictorOps | #DevOpsDays

  35. Nope @jasonhand | VictorOps | #DevOpsDays

  36. MTTR Rather Than MTBF @jasonhand | VictorOps | #DevOpsDays

  37. Failure Is Inevitable @jasonhand | VictorOps | #DevOpsDays

  38. us·er /ˈyoozər/ Distributed fault injection test suite for production. credit:

    Leon Fayer (@papa_fire) @jasonhand | VictorOps | #DevOpsDays
  39. Success is a result of Failure @jasonhand | VictorOps |

    #DevOpsDays
  40. Understand Learn Innovate @jasonhand | VictorOps | #DevOpsDays

  41. re·sil·ient /rəˈzilyənt/ The ability to resist, absorb, recover from or

    successfully adapt to adversity or a change in conditions @jasonhand | VictorOps | #DevOpsDays
  42. Change can cause failure but innovation requires Change @jasonhand |

    VictorOps | #DevOpsDays
  43. Conflict @jasonhand | VictorOps | #DevOpsDays

  44. Change Required @jasonhand | VictorOps | #DevOpsDays

  45. “Without deviation from the norm, progress is not possible ”

    Frank Zappa @jasonhand | VictorOps | #DevOpsDays
  46. What Did You Learn From the Recovery Efforts? (including monitoring

    & alerting) @jasonhand | VictorOps | #DevOpsDays
  47. Postmortems / Learning Reviews: Stories of: What took place leading

    up to & during the disruption & recovery efforts @jasonhand | VictorOps | #DevOpsDays
  48. Who was involved? @jasonhand | VictorOps | #DevOpsDays

  49. What did they see? @jasonhand | VictorOps | #DevOpsDays

  50. What was said? @jasonhand | VictorOps | #DevOpsDays

  51. What actions were taken? jhand.co/chatopsbook @jasonhand | VictorOps | #DevOpsDays

  52. How do events & actions correlate over time? @jasonhand |

    VictorOps | #DevOpsDays
  53. 5 Why's @jasonhand | VictorOps | #DevOpsDays

  54. What is the "cause" of the Problem? Root Cause is

    ... @jasonhand | VictorOps | #DevOpsDays
  55. Our ... obsession with "Root Cause" @jasonhand | VictorOps |

    #DevOpsDays
  56. Asking "why" .. leads to .. Blame @jasonhand | VictorOps

    | #DevOpsDays
  57. Blaming leads to.. operators hiding relevant & important information @jasonhand

    | VictorOps | #DevOpsDays
  58. We must believe that our operators are doing their best

    given the constraints of the "system" @jasonhand | VictorOps | #DevOpsDays
  59. "We are here to" Learn From Failure (and success) @jasonhand

    | VictorOps | #DevOpsDays
  60. Rather than .. @jasonhand | VictorOps | #DevOpsDays

  61. Avoid Failure @jasonhand | VictorOps | #DevOpsDays

  62. What's the Story? @jasonhand | VictorOps | #DevOpsDays

  63. Innovate Learning from both success & failure to develop &

    implement small incremental improvements is critical. @jasonhand | VictorOps | #DevOpsDays
  64. Learning Organization @jasonhand | VictorOps | #DevOpsDays

  65. Learning does NOT come from Reading & Listening @jasonhand |

    VictorOps | #DevOpsDays
  66. Learning comes from Doing @jasonhand | VictorOps | #DevOpsDays

  67. Real Learning comes from: Observing Orienting Deciding Acting John Boyd's

    OODA Loop @jasonhand | VictorOps | #DevOpsDays
  68. Example: Learning to play the Dobro Guitar @jasonhand | VictorOps

    | #DevOpsDays
  69. @jasonhand | VictorOps | #DevOpsDays

  70. Learning @jasonhand | VictorOps | #DevOpsDays

  71. Why? Go from knowing... to understanding... to learning NOTE: (Requires

    making mistakes) @jasonhand | VictorOps | #DevOpsDays
  72. @jasonhand | VictorOps | #DevOpsDays

  73. “We will trade some uptime in exchange for innovation -Dave

    Hahn (Netflix)” DevOpsDays Boise 2016 (today) @jasonhand | VictorOps | #DevOpsDays
  74. Are We Doing it Right? @jasonhand | VictorOps | #DevOpsDays

  75. What do your Postmortems look like? Are they setting you

    up to learn? @jasonhand | VictorOps | #DevOpsDays
  76. "The Story" -Timeline -Who Was Involved -Context (Seeing, Saying, Executing)

    -Action Items (Small Incremental Improvements) @jasonhand | VictorOps | #DevOpsDays
  77. Shift our gaze from: maintaining & protecting @jasonhand | VictorOps

    | #DevOpsDays
  78. Learning Which leads to... Improving & Innovating @jasonhand | VictorOps

    | #DevOpsDays
  79. we increase value of monitoring & alerting of the IT

    teams of Products & Services & of the Organization. @jasonhand | VictorOps | #DevOpsDays
  80. Hypothesize Explore Stretch Experiment Fail Learn Try Again @jasonhand |

    VictorOps | #DevOpsDays
  81. @jasonhand | VictorOps | #DevOpsDays

  82. Learning & Innovating leads to uncovering new ways of building,

    deploying, and maintaining software & infrastructure Which leads to... @jasonhand | VictorOps | #DevOpsDays
  83. Resilient Systems @jasonhand | VictorOps | #DevOpsDays

  84. The By-product of a highly resilient system is ... @jasonhand

    | VictorOps | #DevOpsDays
  85. @jasonhand | VictorOps | #DevOpsDays

  86. Highly Available system @jasonhand | VictorOps | #DevOpsDays

  87. The Unrealized Role of: Monitoring & Alerting is .... @jasonhand

    | VictorOps | #DevOpsDays
  88. Learning & Innovation @jasonhand | VictorOps | #DevOpsDays

  89. Thank You Be Victorious! @jasonhand | VictorOps | #DevOpsDays

  90. References: Monitoring Survey: https://kartar.net/2015/08/ monitoring-survey-2015---metrics/ Firefighter: https://www.learyfirefighters.org/wp- content/uploads/2013/09/cover-slide-1.jpg Mechanic: https://upload.wikimedia.org/wikipedia/

    commons/4/4b/Flickr_-_Israel_Defense_Forces_- _Airplane_Technician,_March_2010.jpg Gnome Plan: http://www.nerdfitness.com/wp-content/ uploads/2012/04/Screen-Shot-2012-03-30-at-3.15.38- AM-1024x7591.jpg NOC: https://upload.wikimedia.org/wikipedia/commons/ @jasonhand | VictorOps | #DevOpsDays
  91. References: Kodak: http://file.answcdn.com/answ-cld/image/upload/ v1/tk/brand_image/b59911fc/ 91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpeg VW Camper: https://upload.wikimedia.org/wikipedia/ commons/d/d7/VW_Camper.jpg Blockbuster:

    https:// jordanandeddie.files.wordpress.com/2013/11/ blockbuster-feature.jpg Borders: http://smashingtops.com/wp-content/uploads/ 2012/06/borders_logo1.jpg @jasonhand | VictorOps | #DevOpsDays
  92. References: Chained Hands: https://www.google.com/url? sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwjgr NCDh5TMAhXJs4MKHaoZDssQjBwIBA&url=http%3A%2F %2Fwww.publicdomainpictures.net%2Fdownload- picture.php%3Fadresar%3D50000%26soubor%3Dhands-in- chains.jpg%26id%3D40426&bvm=bv. 119745492,d.amc&psig=AFQjCNFIdnDPzSqiLA-

    znIW5SCTCUHhqEw&ust=1460926880336203 Inevitable: http://vignette4.wikia.nocookie.net/ matrix/images/5/51/SMITH.png/revision/latest? cb=20110214092002 @jasonhand | VictorOps | #DevOpsDays
  93. References: Accident Free:http://www.compliancesigns.com/media/ digital-scoreboard/1000/Safety-Awareness-Sign- DSE-195271000.gif Stewie: http://chroniclesofredmark.com/wp-content/uploads/ 2014/01/Stewie.gif change: http://i.imgur.com/EQyC6N3.gif

    Hard drive: https://i.imgur.com/pWsKSEf.gif Change: https://farm6.staticflickr.com/ 5208/5270199049df99b234e9od.jpg Value: https://d13yacurqjgara.cloudfront.net/users/ @jasonhand | VictorOps | #DevOpsDays
  94. @jasonhand | VictorOps | #DevOpsDays