Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Unrealized Role of Monitoring & Alerting ( All Day DevOps edition)

j.hand
November 15, 2016

The Unrealized Role of Monitoring & Alerting ( All Day DevOps edition)

When prediction and prevention are focused on more than learning and innovation, organizations are not realizing the full value of monitoring and alerting.

j.hand

November 15, 2016
Tweet

More Decks by j.hand

Other Decks in Technology

Transcript

  1. The Unrealized Role of:
    Monitoring & Alerting
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  2. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  3. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  4. THE UNREALIZED
    ROLE OF:
    Monitoring
    & Alerting
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  5. JASON
    HAND
    DevOps Evangelist
    VictorOps
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  6. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  7. 2015
    MONITORING
    SURVEY
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  8. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  9. WHY ARE YOU COLLECTING THIS DATA?
    NOTE: You may choose more than one
    ▸ Performance analysis and trending
    ▸ Fault and Anomaly detection
    ▸ Capacity Planning
    ▸ A/B Testing
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  10. THE RESULTS
    NOTE: Respondents may have chose more than one
    ▸ Performance analysis and trending - 63%
    ▸ Fault and Anomaly detection - 53%
    ▸ Capacity Planning - 45%
    ▸ A/B Testing - 11%
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  11. Tyranny of the
    S.L.A.
    (Service Level Agreement)
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  12. HIGH
    AVAILABILITY
    Prediction & Prevention
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  13. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  14. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  15. THAT'S IMPORTANT
    ... BUT ...
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  16. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  17. BUSINESS
    OBJECTIVES?
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  18. HAPPY CAMPER
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  19. CUSTOMERS
    want more than just
    99.999% UPTIME
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  20. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  21. WHERE'S THE
    INNOVATION?
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  22. HOW IMPORTANT
    IS
    Learning & Innovation?
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  23. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  24. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  25. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  26. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  27. The result of underutilizing monitoring & alerting
    is that the IT department and the organization have
    no chance to...
    LEARN,
    IMPROVE, OR
    INNOVATE.
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  28. CONTINUALLY UNDERSTANDING & RESPONDING
    TO THE FEEDBACK
    from
    monitoring, logging, & alerting
    allows you to use information about events in the past to drive future
    actions.
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  29. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  30. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  31. It's not just about
    PREDICTION
    & PREVENTION
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  32. RESPOND &
    REPAIR
    ...QUICKLY
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  33. NOPE
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  34. MTTR
    Rather Than
    MTBF
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  35. FAILURE IS
    INEVITABLE
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  36. US·ER
    /ˈYOOZƏR/
    DISTRIBUTED FAULT INJECTION TEST SUITE FOR
    PRODUCTION.
    credit: Leon Fayer (@papa_fire)
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  37. SUCCESS
    is a result of
    FAILURE
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  38. UNDERSTAND
    LEARN
    INNOVATE
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  39. RE·SIL·IENT
    /RƏˈZILYƏNT/
    The ability to resist, absorb, recover from or successfully adapt to
    adversity or a change in conditions
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  40. CHANGE
    can cause failure
    but innovation requires
    CHANGE
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  41. CONFLICT
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  42. CHANGE
    REQUIRED
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  43. Without deviation from the norm,
    progress is not possible
    — Frank Zappa
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  44. What Did You
    LEARN
    From the Recovery Efforts?
    (including monitoring & alerting)
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  45. POSTMORTEMS / LEARNING REVIEWS:
    Stories of:
    WHAT TOOK PLACE
    leading up to & during
    the disruption & recovery efforts
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  46. WHO WAS
    INVOLVED?
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  47. WHAT DID THEY
    SEE?
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  48. WHAT WAS
    SAID?
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  49. WHAT
    ACTIONS
    WERE TAKEN?
    jhand.co/chatopsbook
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  50. HOW DO
    events & actions
    CORRELATE
    OVER TIME?
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  51. 5 Why's
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  52. 5 Why's
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  53. WHAT IS THE "cause"
    OF THE PROBLEM?
    Root Cause is ...
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  54. OUR
    ...
    obsession with
    "Root Cause"
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  55. ASKING "WHY"
    .. leads to ..
    BLAME
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  56. BLAMING
    LEADS TO..
    operators hiding relevant & important
    information
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  57. We must
    BELIEVE
    that our operators are doing their best given the
    constraints of the "system"
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  58. "We are here to"
    LEARN
    From Failure
    (and success)
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  59. RATHER THAN ..
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  60. AVOID
    FAILURE
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  61. WHAT'S THE
    STORY?
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  62. INNOVATE
    Learning from both success & failure
    to develop & implement
    small incremental improvements
    is critical.
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  63. MONITORING &
    ALERTING
    Helps us understand the story in greater detail
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  64. LEARNING
    ORGANIZATION
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  65. Learning does NOT come from
    READING
    &
    LISTENING
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  66. Learning comes from
    DOING
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  67. Real Learning comes from:
    OBSERVING
    ORIENTING
    DECIDING
    ACTING
    John Boyd's OODA Loop
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  68. Example:
    LEARNING TO PLAY THE
    DOBRO GUITAR
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  69. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  70. LEARNING
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  71. WHY?
    Go from knowing...
    to understanding...
    to learning
    NOTE:
    (Requires making mistakes)
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  72. We will trade some uptime in exchange for innovation
    -Dave Hahn (Netflix)
    DevOpsDays Boise 2016
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  73. SHIFT OUR GAZE
    from:
    MAINTAINING
    & PROTECTING
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  74. LEARNING
    Which leads to...
    IMPROVING
    & INNOVATING
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  75. WE INCREASE VALUE OF:
    - Monitoring & Alerting
    - IT teams
    - Products & Services
    - Organization
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  76. HYPOTHESIZE
    EXPLORE
    STRETCH
    EXPERIMENT
    FAIL
    LEARN
    Try Again
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  77. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  78. LEARNING & INNOVATING
    leads to uncovering new ways of
    BUILDING, DEPLOYING, AND MAINTAINING
    SOFTWARE & INFRASTRUCTURE
    Which leads to...
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  79. RESILIENT
    SYSTEMS
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  80. The
    By-product
    of a highly
    RESILIENT
    system is ...
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  81. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  82. HIGHLY
    AVAILABLE
    SYSTEM
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  83. THE UNREALIZED
    ROLE OF:
    Monitoring
    & Alerting is ....
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  84. LEARNING
    &
    INNOVATION
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  85. THANK
    YOU
    Be Victorious!
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  86. @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  87. Monitoring Survey: https://kartar.net/2015/08/monitoring-
    survey-2015---metrics/
    Firefighter: https://www.learyfirefighters.org/wp-content/uploads/
    2013/09/cover-slide-1.jpg
    Mechanic: https://upload.wikimedia.org/wikipedia/commons/4/4b/
    Flickr_-_Israel_Defense_Forces_-
    _Airplane_Technician,_March_2010.jpg
    Gnome Plan: http://www.nerdfitness.com/wp-content/uploads/
    2012/04/Screen-Shot-2012-03-30-at-3.15.38-AM-1024x7591.jpg
    NOC: https://upload.wikimedia.org/wikipedia/commons/0/03/
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  88. References:
    Kodak: http://file.answcdn.com/answ-cld/image/upload/v1/tk/
    brand_image/b59911fc/
    91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpeg
    VW Camper: https://upload.wikimedia.org/wikipedia/commons/d/d7/
    VW_Camper.jpg
    Blockbuster: https://jordanandeddie.files.wordpress.com/2013/11/
    blockbuster-feature.jpg
    Borders: http://smashingtops.com/wp-content/uploads/2012/06/
    borders_logo1.jpg
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  89. Chained Hands: https://www.google.com/url?
    sa=i&rct=j&q=&esrc=s&source=images&cd=&ved=0ahUKEwjgrNCD
    h5TMAhXJs4MKHaoZDssQjBwIBA&url=http%3A%2F
    %2Fwww.publicdomainpictures.net%2Fdownload-picture.php
    %3Fadresar%3D50000%26soubor%3Dhands-in-chains.jpg%26id
    %3D40426&bvm=bv.119745492,d.amc&psig=AFQjCNFIdnDPzSqiLA-
    znIW5SCTCUHhqEw&ust=1460926880336203
    Inevitable: http://vignette4.wikia.nocookie.net/matrix/images/5/51/
    SMITH.png/revision/latest?cb=20110214092002
    Bulb: https://smhttp-ssl-37293.nexcesscdn.net/media/catalog/
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide

  90. scoreboard/1000/Safety-Awareness-Sign-DSE-195271000.gif
    Stewie:
    http://chroniclesofredmark.com/wp-content/uploads/2014/01/
    Stewie.gif
    change: http://i.imgur.com/EQyC6N3.gif
    Hard drive: https://i.imgur.com/pWsKSEf.gif
    Change: https://farm6.staticflickr.com/
    5208/5270199049df99b234e9od.jpg
    Value: https://d13yacurqjgara.cloudfront.net/users/6437/
    screenshots/1405551/value-cropped.gif
    @jasonhand | VictorOps | #AllDayDevOps

    View Slide