Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Staying Alive: Patterns for Failure Management ...

Staying Alive: Patterns for Failure Management From the Bottom of the Ocean

Ronnie Chen

March 25, 2017
Tweet

More Decks by Ronnie Chen

Other Decks in Technology

Transcript

  1. STAYING ALIVE PATTERNS FOR FAILURE MANAGEMENT FROM THE BOTTOM OF

    THE OCEAN RONNIE CHEN SLACK 1 — Ronnie Chen @rondoftw
  2. TECHNICAL DIVING ▸ longer dive times ▸ deeper dives ▸

    overhead ceiling ▸ decompression obligations ▸ more gear. a lot more. ▸ higher pressure ▸ more risks 9 — Ronnie Chen @rondoftw
  3. RISKS MAY INCLUDE... 1. hypoxia 2. hyperoxia 3. nitrogen narcosis

    4. carbon dioxide buildup 5. oxygen sensor failure 6. deep tissure isobaric counterdiffusion (ICD) 7. high pressure nervous syndrome (HPNS) 8. software failure 9. exhausting your carbon dioxide scrubber 10. carbon dioxide channeling from a poorly packed scrubber 11. carbon buildup causing an spark leading to an oxygen fire. underwater. 12. flooding of breathing loop or circuitry 13. water mixing with the scrubbing agent to produce a toxic caustic soda that will give you chemical burns on your mouth, airway, and lungs 14. plain old decompression sickness 10 — Ronnie Chen @rondoftw
  4.   If you own a rebreather for five years, two

    percent of you are going to die on it. — Jill Heinerth, underwater explorer 11 — Ronnie Chen @rondoftw
  5. BUT YOU'RE GETTING A MEANDERING MEDITATION ON BEST PRACTICES* WHEN

    DEALING WITH COMPLEX SYSTEMS INSTEAD * These guidelines have only been shown to work for life or death situations under the ocean. They have not been proven to work for tech. 19 — Ronnie Chen @rondoftw
  6. CATASTROPHES ARE CAUSED BY A FAILURE CASCADE ▸ you have

    a rebreather malfunction ▸ which you would have caught it if you were testing your equipment on a regular basis ▸ your backup tank had a leak and is running low and that wasn't caught either ▸ and your buddy is too far away and isn't checking in with you ▸ and your dive light that you use to communicate at a distance is out of power ▸ and in the excitement you kick up silt and the visibility drops ▸ and in your panic your air consumption goes up and then you breathe through the last of the air in your tank ▸ so you swim for the surface even though you have a decompression obligation 23 — Ronnie Chen @rondoftw
  7. A post-mortem that blames this incident on a simple mechanical

    malfunction would only cover 12.5% of the issues that led up to this accident. 24 — Ronnie Chen @rondoftw
  8. Complex system failures don't happen because a single part of

    the system fails. They happen because all the safety procedures that are supposed to protect them from the simple system failure didn't work. 25 — Ronnie Chen @rondoftw
  9. CORE RULES OF SAFETY SYSTEMS 1. An unused safety system

    doesn't exist. 26 — Ronnie Chen @rondoftw
  10. NORMALIZATION OF DEVIANCE That natural human tendency, particularly in pressure

    circumstances, to take a safety shortcut. To accept a lower standard of performance. — Colonel Mike Mullane, astronaut 27 — Ronnie Chen @rondoftw
  11. FALSE FEEDBACK the absence of something bad happening means that

    it was safe ADAPTATION experience is no longer a suitable gauge of risk SOCIAL PRESSURE this is just how we do things 28 — Ronnie Chen @rondoftw
  12. CORE RULES OF SAFETY SYSTEMS 2. An untested safety system

    doesn't exist either! 29 — Ronnie Chen @rondoftw
  13. CORE RULES OF SAFETY SYSTEMS 3.Unused or untested safety systems

    are more dangerous than not having one at all. Therefore, safety systems must be tested at regular intervals. The length of this interval should be determined not only by how likely it is for this system to fail but also how great the impact will be if it does. 30 — Ronnie Chen @rondoftw
  14. A QUICK SIDENOTE ON ASSESSING RISK ▸ Make assessments based

    on likelihood of occurrence. ▸ Make assessments based on magnitude of regret. If you are only evaluating risk based on the chance of it happening, you must be prepared to experience the corresponding level of regret if it does. 31 — Ronnie Chen @rondoftw
  15. FAILURE MANAGEMENT ▸ A framework for redundancy ▸ The training

    and judgment to use it 34 — Ronnie Chen @rondoftw
  16. FAILURE MANAGEMENT FOR SYSTEMS ▸ Have redundancy for systems that

    you cannot survive without. ▸ Have a redundant pathway to success: a procedure for graceful degradation for systems that are important but not critical. ▸ Have a process for changing over from primary to redundant systems. 35 — Ronnie Chen @rondoftw
  17. FAILURE MANAGEMENT FOR SYSTEMS (CONT) ▸ Keep failures contained so

    that they don't bring down other systems ▸ Make it easy to do the right thing and hard to do the dangerous things 36 — Ronnie Chen @rondoftw
  18. TRAINING: INEXPERIENCED PEOPLE TO THE FRONT ▸ Most inexperienced person

    leads ▸ Experienced person advises and intervenes only when necessary ▸ Team is invested in personal success to ensure mission success 39 — Ronnie Chen @rondoftw
  19. TRAINING: INEXPERIENCED PEOPLE TO THE FRONT (CONT) ▸ Frees up

    more experienced people from micromanaging ▸ Opportunity to revise and improve problematic systems ▸ One of the best ways to equalize a gap in experience 40 — Ronnie Chen @rondoftw
  20. GOOD JUDGMENT Good judgment enables the reshaping of rules and

    frameworks to adapt to a changing environment. 41 — Ronnie Chen @rondoftw
  21. REFINING JUDGMENT ▸ Post-Mortems ▸ Pre-Mortems ▸ Fire Drills ▸

    Revisit Past Decisions 42 — Ronnie Chen @rondoftw
  22. POST-MORTEMS ▸ Look at the safety procedures that failed to

    stop the cascade ▸ Look for opportunities to create new safety systems at critical points 43 — Ronnie Chen @rondoftw
  23. PRE-MORTEMS ▸ Don't wait for failures to build safety frameworks

    ▸ Identify potential avenues of of failure and make plans for them ▸ Include both likely failures and high regret failures 44 — Ronnie Chen @rondoftw
  24. FIRE DRILLS ▸ Vet your plans and safety systems ▸

    Perform targeted training ▸ Evaluate effectiveness of tools and documentation 45 — Ronnie Chen @rondoftw
  25. REVISIT PAST DECISIONS ▸ Examine successful operations to see what

    key insights were helpful ▸ Identify any dependency on luck in previous projects ▸ Share rationale for decisions 46 — Ronnie Chen @rondoftw
  26. I WANT TO LEARN MORE! 1. Diane Vaughn - The

    Challenger Launch Decision 2. Richard I. Cook - How Complex Systems Fail 3. Mike Mullane - https://www.youtube.com/watch?v=Ljzj9Msli5o 4. Steve Lewis aka decodoppler - Staying Alive 5. Sidney Dekker - Drift into Failure 48 — Ronnie Chen @rondoftw