Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Unrealized Role of Monitoring & Alerting

j.hand
April 27, 2016

The Unrealized Role of Monitoring & Alerting

There's more value to be had from our monitoring, logging, and alerting tools. When we focus only on maintaining and firefighting .. and don't give any consideration to learning and innovating, we aren't capturing the full value of those tools, services, team, or company.

j.hand

April 27, 2016
Tweet

More Decks by j.hand

Other Decks in Technology

Transcript

  1. Why Are You Collecting This Data? NOTE: You may choose

    more than one » Performance analysis and trending » Fault and Anomaly detection » Capacity Planning » A/B Testing » We don’t do anything with collected metrics @jasonhand | VictorOps | #DevOpsDays
  2. The Results NOTE: Respondents may have chose more than one

    » Performance analysis and trending - 63% » Fault and Anomaly detection - 53% » Capacity Planning - 45% » A/B Testing - 11% » We don’t do anything with collected metrics - 3% @jasonhand | VictorOps | #DevOpsDays
  3. The result of underutilizing monitoring & alerting is that the

    IT department and the organization have no chance to... learn, improve, or innovate. @jasonhand | VictorOps | #DevOpsDays
  4. Continually understanding & responding to the feedback from monitoring, logging,

    & alerting allows you to use information about events in the past to drive future actions. @jasonhand | VictorOps | #DevOpsDays
  5. us·er /ˈyoozər/ Distributed fault injection test suite for production. credit:

    Leon Fayer (@papa_fire) @jasonhand | VictorOps | #DevOpsDays
  6. re·sil·ient /rəˈzilyənt/ The ability to resist, absorb, recover from or

    successfully adapt to adversity or a change in conditions @jasonhand | VictorOps | #DevOpsDays
  7. “Without deviation from the norm, progress is not possible ”

    Frank Zappa @jasonhand | VictorOps | #DevOpsDays
  8. What Did You Learn From the Recovery Efforts? (including monitoring

    & alerting) @jasonhand | VictorOps | #DevOpsDays
  9. Postmortems / Learning Reviews: Stories of: What took place leading

    up to & during the disruption & recovery efforts @jasonhand | VictorOps | #DevOpsDays
  10. What is the "cause" of the Problem? Root Cause is

    ... @jasonhand | VictorOps | #DevOpsDays
  11. We must believe that our operators are doing their best

    given the constraints of the "system" @jasonhand | VictorOps | #DevOpsDays
  12. Innovate Learning from both success & failure to develop &

    implement small incremental improvements is critical. @jasonhand | VictorOps | #DevOpsDays
  13. Real Learning comes from: Observing Orienting Deciding Acting John Boyd's

    OODA Loop @jasonhand | VictorOps | #DevOpsDays
  14. Why? Go from knowing... to understanding... to learning NOTE: (Requires

    making mistakes) @jasonhand | VictorOps | #DevOpsDays
  15. “We will trade some uptime in exchange for innovation -Dave

    Hahn (Netflix)” DevOpsDays Boise 2016 (today) @jasonhand | VictorOps | #DevOpsDays
  16. What do your Postmortems look like? Are they setting you

    up to learn? @jasonhand | VictorOps | #DevOpsDays
  17. "The Story" -Timeline -Who Was Involved -Context (Seeing, Saying, Executing)

    -Action Items (Small Incremental Improvements) @jasonhand | VictorOps | #DevOpsDays
  18. we increase value of monitoring & alerting of the IT

    teams of Products & Services & of the Organization. @jasonhand | VictorOps | #DevOpsDays
  19. Learning & Innovating leads to uncovering new ways of building,

    deploying, and maintaining software & infrastructure Which leads to... @jasonhand | VictorOps | #DevOpsDays
  20. References: Monitoring Survey: https://kartar.net/2015/08/ monitoring-survey-2015---metrics/ Firefighter: https://www.learyfirefighters.org/wp- content/uploads/2013/09/cover-slide-1.jpg Mechanic: https://upload.wikimedia.org/wikipedia/

    commons/4/4b/Flickr_-_Israel_Defense_Forces_- _Airplane_Technician,_March_2010.jpg Gnome Plan: http://www.nerdfitness.com/wp-content/ uploads/2012/04/Screen-Shot-2012-03-30-at-3.15.38- AM-1024x7591.jpg NOC: https://upload.wikimedia.org/wikipedia/commons/ @jasonhand | VictorOps | #DevOpsDays
  21. References: Kodak: http://file.answcdn.com/answ-cld/image/upload/ v1/tk/brand_image/b59911fc/ 91d6e71d30a0878dfe3cb30a22751cb874a3ea8c.jpeg VW Camper: https://upload.wikimedia.org/wikipedia/ commons/d/d7/VW_Camper.jpg Blockbuster:

    https:// jordanandeddie.files.wordpress.com/2013/11/ blockbuster-feature.jpg Borders: http://smashingtops.com/wp-content/uploads/ 2012/06/borders_logo1.jpg @jasonhand | VictorOps | #DevOpsDays
  22. References: Accident Free:http://www.compliancesigns.com/media/ digital-scoreboard/1000/Safety-Awareness-Sign- DSE-195271000.gif Stewie: http://chroniclesofredmark.com/wp-content/uploads/ 2014/01/Stewie.gif change: http://i.imgur.com/EQyC6N3.gif

    Hard drive: https://i.imgur.com/pWsKSEf.gif Change: https://farm6.staticflickr.com/ 5208/5270199049df99b234e9od.jpg Value: https://d13yacurqjgara.cloudfront.net/users/ @jasonhand | VictorOps | #DevOpsDays