$30 off During Our Annual Pro Sale. View Details »

The Unrealized Role of Monitoring & Alerting ( All Day DevOps edition)

j.hand
November 15, 2016

The Unrealized Role of Monitoring & Alerting ( All Day DevOps edition)

When prediction and prevention are focused on more than learning and innovation, organizations are not realizing the full value of monitoring and alerting.

j.hand

November 15, 2016
Tweet

More Decks by j.hand

Other Decks in Technology

Transcript

  1. The Unrealized Role of: Monitoring & Alerting @jasonhand | VictorOps

    | #AllDayDevOps
  2. @jasonhand | VictorOps | #AllDayDevOps

  3. @jasonhand | VictorOps | #AllDayDevOps

  4. THE UNREALIZED ROLE OF: Monitoring & Alerting @jasonhand | VictorOps

    | #AllDayDevOps
  5. JASON HAND DevOps Evangelist VictorOps @jasonhand | VictorOps | #AllDayDevOps

  6. @jasonhand | VictorOps | #AllDayDevOps

  7. 2015 MONITORING SURVEY @jasonhand | VictorOps | #AllDayDevOps

  8. @jasonhand | VictorOps | #AllDayDevOps

  9. WHY ARE YOU COLLECTING THIS DATA? NOTE: You may choose

    more than one ▸ Performance analysis and trending ▸ Fault and Anomaly detection ▸ Capacity Planning ▸ A/B Testing @jasonhand | VictorOps | #AllDayDevOps
  10. THE RESULTS NOTE: Respondents may have chose more than one

    ▸ Performance analysis and trending - 63% ▸ Fault and Anomaly detection - 53% ▸ Capacity Planning - 45% ▸ A/B Testing - 11% @jasonhand | VictorOps | #AllDayDevOps
  11. Tyranny of the S.L.A. (Service Level Agreement) @jasonhand | VictorOps

    | #AllDayDevOps
  12. HIGH AVAILABILITY Prediction & Prevention @jasonhand | VictorOps | #AllDayDevOps

  13. @jasonhand | VictorOps | #AllDayDevOps

  14. @jasonhand | VictorOps | #AllDayDevOps

  15. THAT'S IMPORTANT ... BUT ... @jasonhand | VictorOps | #AllDayDevOps

  16. @jasonhand | VictorOps | #AllDayDevOps

  17. BUSINESS OBJECTIVES? @jasonhand | VictorOps | #AllDayDevOps

  18. HAPPY CAMPER @jasonhand | VictorOps | #AllDayDevOps

  19. CUSTOMERS want more than just 99.999% UPTIME @jasonhand | VictorOps

    | #AllDayDevOps
  20. @jasonhand | VictorOps | #AllDayDevOps

  21. WHERE'S THE INNOVATION? @jasonhand | VictorOps | #AllDayDevOps

  22. HOW IMPORTANT IS Learning & Innovation? @jasonhand | VictorOps |

    #AllDayDevOps
  23. @jasonhand | VictorOps | #AllDayDevOps

  24. @jasonhand | VictorOps | #AllDayDevOps

  25. @jasonhand | VictorOps | #AllDayDevOps

  26. @jasonhand | VictorOps | #AllDayDevOps

  27. The result of underutilizing monitoring & alerting is that the

    IT department and the organization have no chance to... LEARN, IMPROVE, OR INNOVATE. @jasonhand | VictorOps | #AllDayDevOps
  28. CONTINUALLY UNDERSTANDING & RESPONDING TO THE FEEDBACK from monitoring, logging,

    & alerting allows you to use information about events in the past to drive future actions. @jasonhand | VictorOps | #AllDayDevOps
  29. @jasonhand | VictorOps | #AllDayDevOps

  30. @jasonhand | VictorOps | #AllDayDevOps

  31. It's not just about PREDICTION & PREVENTION @jasonhand | VictorOps

    | #AllDayDevOps
  32. RESPOND & REPAIR ...QUICKLY @jasonhand | VictorOps | #AllDayDevOps

  33. NOPE @jasonhand | VictorOps | #AllDayDevOps

  34. MTTR Rather Than MTBF @jasonhand | VictorOps | #AllDayDevOps

  35. FAILURE IS INEVITABLE @jasonhand | VictorOps | #AllDayDevOps

  36. US·ER /ˈYOOZƏR/ DISTRIBUTED FAULT INJECTION TEST SUITE FOR PRODUCTION. credit:

    Leon Fayer (@papa_fire) @jasonhand | VictorOps | #AllDayDevOps
  37. SUCCESS is a result of FAILURE @jasonhand | VictorOps |

    #AllDayDevOps
  38. UNDERSTAND LEARN INNOVATE @jasonhand | VictorOps | #AllDayDevOps

  39. RE·SIL·IENT /RƏˈZILYƏNT/ The ability to resist, absorb, recover from or

    successfully adapt to adversity or a change in conditions @jasonhand | VictorOps | #AllDayDevOps
  40. CHANGE can cause failure but innovation requires CHANGE @jasonhand |

    VictorOps | #AllDayDevOps
  41. CONFLICT @jasonhand | VictorOps | #AllDayDevOps

  42. CHANGE REQUIRED @jasonhand | VictorOps | #AllDayDevOps

  43. Without deviation from the norm, progress is not possible —

    Frank Zappa @jasonhand | VictorOps | #AllDayDevOps
  44. What Did You LEARN From the Recovery Efforts? (including monitoring

    & alerting) @jasonhand | VictorOps | #AllDayDevOps
  45. POSTMORTEMS / LEARNING REVIEWS: Stories of: WHAT TOOK PLACE leading

    up to & during the disruption & recovery efforts @jasonhand | VictorOps | #AllDayDevOps
  46. WHO WAS INVOLVED? @jasonhand | VictorOps | #AllDayDevOps

  47. WHAT DID THEY SEE? @jasonhand | VictorOps | #AllDayDevOps

  48. WHAT WAS SAID? @jasonhand | VictorOps | #AllDayDevOps

  49. WHAT ACTIONS WERE TAKEN? jhand.co/chatopsbook @jasonhand | VictorOps | #AllDayDevOps

  50. HOW DO events & actions CORRELATE OVER TIME? @jasonhand |

    VictorOps | #AllDayDevOps
  51. 5 Why's @jasonhand | VictorOps | #AllDayDevOps

  52. 5 Why's @jasonhand | VictorOps | #AllDayDevOps

  53. WHAT IS THE "cause" OF THE PROBLEM? Root Cause is

    ... @jasonhand | VictorOps | #AllDayDevOps
  54. OUR ... obsession with "Root Cause" @jasonhand | VictorOps |

    #AllDayDevOps
  55. ASKING "WHY" .. leads to .. BLAME @jasonhand | VictorOps

    | #AllDayDevOps
  56. BLAMING LEADS TO.. operators hiding relevant & important information @jasonhand

    | VictorOps | #AllDayDevOps
  57. We must BELIEVE that our operators are doing their best

    given the constraints of the "system" @jasonhand | VictorOps | #AllDayDevOps
  58. "We are here to" LEARN From Failure (and success) @jasonhand

    | VictorOps | #AllDayDevOps
  59. RATHER THAN .. @jasonhand | VictorOps | #AllDayDevOps

  60. AVOID FAILURE @jasonhand | VictorOps | #AllDayDevOps

  61. WHAT'S THE STORY? @jasonhand | VictorOps | #AllDayDevOps

  62. INNOVATE Learning from both success & failure to develop &

    implement small incremental improvements is critical. @jasonhand | VictorOps | #AllDayDevOps
  63. MONITORING & ALERTING Helps us understand the story in greater

    detail @jasonhand | VictorOps | #AllDayDevOps
  64. LEARNING ORGANIZATION @jasonhand | VictorOps | #AllDayDevOps

  65. Learning does NOT come from READING & LISTENING @jasonhand |

    VictorOps | #AllDayDevOps
  66. Learning comes from DOING @jasonhand | VictorOps | #AllDayDevOps

  67. Real Learning comes from: OBSERVING ORIENTING DECIDING ACTING John Boyd's

    OODA Loop @jasonhand | VictorOps | #AllDayDevOps
  68. Example: LEARNING TO PLAY THE DOBRO GUITAR @jasonhand | VictorOps

    | #AllDayDevOps
  69. @jasonhand | VictorOps | #AllDayDevOps

  70. LEARNING @jasonhand | VictorOps | #AllDayDevOps

  71. WHY? Go from knowing... to understanding... to learning NOTE: (Requires

    making mistakes) @jasonhand | VictorOps | #AllDayDevOps
  72. We will trade some uptime in exchange for innovation -Dave

    Hahn (Netflix) DevOpsDays Boise 2016 @jasonhand | VictorOps | #AllDayDevOps
  73. SHIFT OUR GAZE from: MAINTAINING & PROTECTING @jasonhand | VictorOps

    | #AllDayDevOps
  74. LEARNING Which leads to... IMPROVING & INNOVATING @jasonhand | VictorOps

    | #AllDayDevOps
  75. WE INCREASE VALUE OF: - Monitoring & Alerting - IT

    teams - Products & Services - Organization @jasonhand | VictorOps | #AllDayDevOps
  76. HYPOTHESIZE EXPLORE STRETCH EXPERIMENT FAIL LEARN Try Again @jasonhand |

    VictorOps | #AllDayDevOps
  77. @jasonhand | VictorOps | #AllDayDevOps

  78. LEARNING & INNOVATING leads to uncovering new ways of BUILDING,

    DEPLOYING, AND MAINTAINING SOFTWARE & INFRASTRUCTURE Which leads to... @jasonhand | VictorOps | #AllDayDevOps
  79. RESILIENT SYSTEMS @jasonhand | VictorOps | #AllDayDevOps

  80. The By-product of a highly RESILIENT system is ... @jasonhand

    | VictorOps | #AllDayDevOps
  81. @jasonhand | VictorOps | #AllDayDevOps

  82. HIGHLY AVAILABLE SYSTEM @jasonhand | VictorOps | #AllDayDevOps

  83. THE UNREALIZED ROLE OF: Monitoring & Alerting is .... @jasonhand

    | VictorOps | #AllDayDevOps
  84. LEARNING & INNOVATION @jasonhand | VictorOps | #AllDayDevOps

  85. THANK YOU Be Victorious! @jasonhand | VictorOps | #AllDayDevOps

  86. @jasonhand | VictorOps | #AllDayDevOps

  87. Monitoring Survey: https://kartar.net/2015/08/monitoring- survey-2015---metrics/ Firefighter: https://www.learyfirefighters.org/wp-content/uploads/ 2013/09/cover-slide-1.jpg Mechanic: https://upload.wikimedia.org/wikipedia/commons/4/4b/ Flickr_-_Israel_Defense_Forces_-

    _Airplane_Technician,_March_2010.jpg Gnome Plan: http://www.nerdfitness.com/wp-content/uploads/ 2012/04/Screen-Shot-2012-03-30-at-3.15.38-AM-1024x7591.jpg NOC: https://upload.wikimedia.org/wikipedia/commons/0/03/ @jasonhand | VictorOps | #AllDayDevOps
  88. References: Kodak: http://file.answcdn.com/answ-cld/image/upload/v1/tk/ brand_image/b59911fc/ 91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpeg VW Camper: https://upload.wikimedia.org/wikipedia/commons/d/d7/ VW_Camper.jpg Blockbuster:

    https://jordanandeddie.files.wordpress.com/2013/11/ blockbuster-feature.jpg Borders: http://smashingtops.com/wp-content/uploads/2012/06/ borders_logo1.jpg @jasonhand | VictorOps | #AllDayDevOps
  89. Chained Hands: https://www.google.com/url? sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwjgrNCD h5TMAhXJs4MKHaoZDssQjBwIBA&url=http%3A%2F %2Fwww.publicdomainpictures.net%2Fdownload-picture.php %3Fadresar%3D50000%26soubor%3Dhands-in-chains.jpg%26id %3D40426&bvm=bv.119745492,d.amc&psig=AFQjCNFIdnDPzSqiLA- znIW5SCTCUHhqEw&ust=1460926880336203 Inevitable:

    http://vignette4.wikia.nocookie.net/matrix/images/5/51/ SMITH.png/revision/latest?cb=20110214092002 Bulb: https://smhttp-ssl-37293.nexcesscdn.net/media/catalog/ @jasonhand | VictorOps | #AllDayDevOps
  90. scoreboard/1000/Safety-Awareness-Sign-DSE-195271000.gif Stewie: http://chroniclesofredmark.com/wp-content/uploads/2014/01/ Stewie.gif change: http://i.imgur.com/EQyC6N3.gif Hard drive: https://i.imgur.com/pWsKSEf.gif Change:

    https://farm6.staticflickr.com/ 5208/5270199049df99b234e9od.jpg Value: https://d13yacurqjgara.cloudfront.net/users/6437/ screenshots/1405551/value-cropped.gif @jasonhand | VictorOps | #AllDayDevOps