Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Unrealized Role of Monitoring & Alerting ( All Day DevOps edition)

j.hand
November 15, 2016

The Unrealized Role of Monitoring & Alerting ( All Day DevOps edition)

When prediction and prevention are focused on more than learning and innovation, organizations are not realizing the full value of monitoring and alerting.

j.hand

November 15, 2016
Tweet

More Decks by j.hand

Other Decks in Technology

Transcript

  1. WHY ARE YOU COLLECTING THIS DATA? NOTE: You may choose

    more than one ▸ Performance analysis and trending ▸ Fault and Anomaly detection ▸ Capacity Planning ▸ A/B Testing @jasonhand | VictorOps | #AllDayDevOps
  2. THE RESULTS NOTE: Respondents may have chose more than one

    ▸ Performance analysis and trending - 63% ▸ Fault and Anomaly detection - 53% ▸ Capacity Planning - 45% ▸ A/B Testing - 11% @jasonhand | VictorOps | #AllDayDevOps
  3. The result of underutilizing monitoring & alerting is that the

    IT department and the organization have no chance to... LEARN, IMPROVE, OR INNOVATE. @jasonhand | VictorOps | #AllDayDevOps
  4. CONTINUALLY UNDERSTANDING & RESPONDING TO THE FEEDBACK from monitoring, logging,

    & alerting allows you to use information about events in the past to drive future actions. @jasonhand | VictorOps | #AllDayDevOps
  5. US·ER /ˈYOOZƏR/ DISTRIBUTED FAULT INJECTION TEST SUITE FOR PRODUCTION. credit:

    Leon Fayer (@papa_fire) @jasonhand | VictorOps | #AllDayDevOps
  6. RE·SIL·IENT /RƏˈZILYƏNT/ The ability to resist, absorb, recover from or

    successfully adapt to adversity or a change in conditions @jasonhand | VictorOps | #AllDayDevOps
  7. Without deviation from the norm, progress is not possible —

    Frank Zappa @jasonhand | VictorOps | #AllDayDevOps
  8. What Did You LEARN From the Recovery Efforts? (including monitoring

    & alerting) @jasonhand | VictorOps | #AllDayDevOps
  9. POSTMORTEMS / LEARNING REVIEWS: Stories of: WHAT TOOK PLACE leading

    up to & during the disruption & recovery efforts @jasonhand | VictorOps | #AllDayDevOps
  10. WHAT IS THE "cause" OF THE PROBLEM? Root Cause is

    ... @jasonhand | VictorOps | #AllDayDevOps
  11. We must BELIEVE that our operators are doing their best

    given the constraints of the "system" @jasonhand | VictorOps | #AllDayDevOps
  12. INNOVATE Learning from both success & failure to develop &

    implement small incremental improvements is critical. @jasonhand | VictorOps | #AllDayDevOps
  13. MONITORING & ALERTING Helps us understand the story in greater

    detail @jasonhand | VictorOps | #AllDayDevOps
  14. Real Learning comes from: OBSERVING ORIENTING DECIDING ACTING John Boyd's

    OODA Loop @jasonhand | VictorOps | #AllDayDevOps
  15. WHY? Go from knowing... to understanding... to learning NOTE: (Requires

    making mistakes) @jasonhand | VictorOps | #AllDayDevOps
  16. We will trade some uptime in exchange for innovation -Dave

    Hahn (Netflix) DevOpsDays Boise 2016 @jasonhand | VictorOps | #AllDayDevOps
  17. WE INCREASE VALUE OF: - Monitoring & Alerting - IT

    teams - Products & Services - Organization @jasonhand | VictorOps | #AllDayDevOps
  18. LEARNING & INNOVATING leads to uncovering new ways of BUILDING,

    DEPLOYING, AND MAINTAINING SOFTWARE & INFRASTRUCTURE Which leads to... @jasonhand | VictorOps | #AllDayDevOps
  19. Monitoring Survey: https://kartar.net/2015/08/monitoring- survey-2015---metrics/ Firefighter: https://www.learyfirefighters.org/wp-content/uploads/ 2013/09/cover-slide-1.jpg Mechanic: https://upload.wikimedia.org/wikipedia/commons/4/4b/ Flickr_-_Israel_Defense_Forces_-

    _Airplane_Technician,_March_2010.jpg Gnome Plan: http://www.nerdfitness.com/wp-content/uploads/ 2012/04/Screen-Shot-2012-03-30-at-3.15.38-AM-1024x7591.jpg NOC: https://upload.wikimedia.org/wikipedia/commons/0/03/ @jasonhand | VictorOps | #AllDayDevOps
  20. References: Kodak: http://file.answcdn.com/answ-cld/image/upload/v1/tk/ brand_image/b59911fc/ 91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpeg VW Camper: https://upload.wikimedia.org/wikipedia/commons/d/d7/ VW_Camper.jpg Blockbuster:

    https://jordanandeddie.files.wordpress.com/2013/11/ blockbuster-feature.jpg Borders: http://smashingtops.com/wp-content/uploads/2012/06/ borders_logo1.jpg @jasonhand | VictorOps | #AllDayDevOps
  21. scoreboard/1000/Safety-Awareness-Sign-DSE-195271000.gif Stewie: http://chroniclesofredmark.com/wp-content/uploads/2014/01/ Stewie.gif change: http://i.imgur.com/EQyC6N3.gif Hard drive: https://i.imgur.com/pWsKSEf.gif Change:

    https://farm6.staticflickr.com/ 5208/5270199049df99b234e9od.jpg Value: https://d13yacurqjgara.cloudfront.net/users/6437/ screenshots/1405551/value-cropped.gif @jasonhand | VictorOps | #AllDayDevOps