Flaming Poo & The Human Response

Flaming ! & the Human Response @jasonhand

Jason Hand DevOps Evangelist VictorOps @jasonhand

Systems WILL have outages @jasonhand

@jasonhand

SIKE! U MAD? @jasonhand

@jasonhand

Have You Tried ... turning it off and on again?
@jasonhand

It's the FUTURE @jasonhand

! WILL BREAK @jasonhand

... Halp ... @jasonhand

I don't think I liked that Nope! @jasonhand

Postmortem @jasonhand

@jasonhand

Complicated (Knowable) "known unknowns" @jasonhand

- Indianapolis Raceway Park (1997) @jasonhand

Complex (Unknown) "unknown unknowns" @jasonhand

@jasonhand

Cynefin Framework @jasonhand

@jasonhand

Obvious The relationship between cause & effect is obvious sense
- categorize - respond "Best Practice" @jasonhand

Complicated The relationship between cause & effect requires analysis, investigation,
triage, and/or "expert" knowledge sense - analyze - respond "Good Practice" @jasonhand

Complex The relationship between cause & effect can only be
perceived through retrospect probe - sense - respond "Emergent Practice" @jasonhand

Chaotic No relationship between cause & effect at systems level
act - sense - respond "Novel Practice" @jasonhand

Say Root Cause One more time .. @jasonhand

Remember when? @jasonhand

Ops didn't like Devs messing with infrastructure @jasonhand

Make Ops Great Again! @jasonhand

From No No No To Go Go Go @jasonhand

Full Stack @jasonhand

"We know that engineers build better systems when they support
those systems"5 5 Pete Cheslock (ThreatStack) - Velocity New York (10/14/15) @jasonhand

Devs on-call @jasonhand

Ops @jasonhand

Devs @jasonhand

@jasonhand

80% of outages will be caused by people and process
issues1 1 Gartner (https://www.gartner.com/doc/334197/nsm-weakest-link-business-availability) @jasonhand

Humans May not (always) be a contributing factor But ...
They are (likely) part of the resolution or improvement process @jasonhand

And as Fallaby Humans we are susceptable to Bias @jasonhand

Cognitive bias.. Deviation in judgement due to choosing timeliness over
accuracy (ETTO) - Effeciency to Thoroughness Trade-Off @jasonhand

Normalcy bias We believe it won't happen to us, because
it hasn't previously @jasonhand

Hindsight bias We believe it was predictable despite all evidence
to the contrary @jasonhand

Confirmation bias We seek information to back our up our
position @jasonhand

Our minds look for short cuts @jasonhand

@jasonhand

We are WIRED ..to blame "Blame is a way to
discharge pain and discomfort"6 6 Brene Brown @jasonhand

How? Instead of Who? Focused on removing blame & the
many forms of bias that prevent us from identifying areas of improvement? @jasonhand

The point of a postmortem is to accurately describe the
“story” of what took place so that we can.. learn & improve @jasonhand

Are you a Paid Pro? @jasonhand

Maxim: We are here to Learn AND Improve @jasonhand

In the beginning @jasonhand

Establish the Timeline What did we notice ﬁrst and when?
@jasonhand

Describe rather than explain Give an accurate account of what
took place @jasonhand

context @jasonhand

Conversations & Actions @jasonhand

Contributing factor Deﬁnition: Something that is partly responsible for a
development or anomaly @jasonhand

everybody gets a voice @jasonhand

(S) pecific (M) easurable (A) ctionable (R) ealistic (T) imely
... Action Items aimed at (small) incremental improvements @jasonhand

Is it working? @jasonhand

MTTA & MTTR improvements over time @jasonhand

Improvements in... Volume of Actionable Alerts @jasonhand

"It's not about the outcome! It's about the response"7 7
- J. Paul Reed (@jpaulreed) & Kevina Finn-Braun (@kﬁnnbraun) @jasonhand

Continuous Incremental Improvements (i.e. baby steps) @jasonhand

Teeny, Tiny Action Items @jasonhand

Ongoing @jasonhand

Barriers & Friction: Knock'em Down (walls, silos, bottlenecks, bad process)
@jasonhand

Never Finished with Continuous Improvements @jasonhand

Never Finished with Transforming the way we deliver software @jasonhand

Continuous @jasonhand

Thank You @jasonhand

Abstract Even the best designed systems can and will have
outages. No matter how well you’ve hardened your infrastructure and put in place failover or self- healing automation, something you didn’t see coming will wreak havoc in your special snowflake of a system. In many cases a human is likely to be a contributing factor. In fact, Gartner has predicted that in 2015, 80% of outages will be caused by people and process issues. Are you considering the Human element when revisiting incidents and outages with your infrastructure? If so, are you approaching it with a blameless mindset focused on removing the many forms of bias and searching for absolute truth. Do you believe that there is always a root cause to outages or is it more accurate to seek out additional aspects that may have contributed to the incident, especially with regard to the people and processes? Regardless of your approach, the point of a postmortem is to accurately describe the "story" about what took place in as much detail as possible. The good, the bad, those involved, conversations had, actions taken, related timestamps, who was on-call, etc. You want to know absolutely everything that took place that was related in some degree so that you can review the data and learn from it. How do we ensure that we are asking the right questions and seeking out relevant and important information that will help us understand what took place and ultimately how to become a better team, company, and product as a result? The blameless culture (specifically blameless postmortems) is a topic of interest to many in the middle of a DevOps transformation within their organization. I'll outline important best practices for conducting effective postmortems and demonstrate methods to measure benefits from adopting postmortems especially those of a "blameless" nature. @jasonhand

Images: https://thinkbeyondthelogo.files.wordpress.com/2015/06/machine.jpg http://4.bp.blogspot.com/-TTAqwl4SFSM/UGObgB-qbSI/AAAAAAAAA5Y/jp216LHBb7A/s1600/slide2375301201312free.jpg http://www.reactiongifs.com/r/brule-omg.gif http://www.chadecerebro.com.br/wp-content/uploads/2015/05/diferen%C3%A7a-entre-cliques-e-sess%C3%B5es.png http://s3-ec.buzzfed.com/static/2014-03/enhanced/webdr02/8/22/anigifenhanced-buzz-25148-1394334423-21.gif http://helixpc.com/wp-content/uploads/2014/02/80421-blue-circuit-board1.jpg http://www.designvertise.com/wp-content/uploads/2014/05/Mountain-Graph-by-Seth-Eckert.gif https://giphy.com/gifs/feels-adventure-time-fangirling-oxLgK1Rrubpba http://orangesv.com/wp-content/uploads/2015/01/neural-network-aficionados-ersatz-event-brain-graphic-1140x440-1140x440.jpg
https://thenypost.files.wordpress.com/2014/10/hoverboard2.jpg http://images.goranhoracek.com.s3.amazonaws.com/wp-content/uploads/2011/01/knife3.jpeg http://www.kaizen-news.com/wp-content/uploads/2014/10/kaizen-small-improvements.png http://www.clarkgaither.com/wp-content/uploads/2015/03/Man-Pointing-Finger.jpg http://www.hdwallpaper.nu/wp-content/uploads/2015/02/os_x_lynx-2560x1600.jpg http://www.grassrootsfitness.ie/wordpress/wp-content/uploads/2014/07/starting-line.jpeg http://blog.hace-online.nl/wp-content/uploads/2011/06/Stamina-concept.png http://www.gregoryclassics.com/wp-content/uploads/2014/09/sail.jpg http://www.photos-public-domain.com/wp-content/uploads/2011/09/smart.jpg https://c1.staticflickr.com/9/8303/7785828546a1fda0801bb.jpg https://nesncom.files.wordpress.com/2013/12/peyton-manning2.jpg?w=599&h=492 @jasonhand

Resources: • http://werve.net/articles/running-effective-retrospectives/ • http://blog.hut8labs.com/dan-talks-about-post-mortems.html • http://product.hubspot.com/blog/bid/64771/Post-Mortems-at-HubSpot-What-I-Learned- From-250-Whys • https://medium.com/towards-a-remarkable-career/how-to-run-a-simple-
postmortem-9c3eff094b5f • http://www.kitchensoap.com/2014/11/14/the-inﬁnite-hows-or-the-dangers-of-the-ﬁve-whys/ • https://www.gartner.com/doc/334197/nsm-weakest-link-business-availability • https://www.victorops.com/blog • https://www.jasonhand.com @jasonhand

Flaming Poo & The Human Response

Flaming Poo & The Human Response

More Decks by j.hand

Other Decks in Technology

Featured

Transcript