Failure and the Third Way - Aaron Blythe, DevOpsDays Chicago 2019

Failure and the Third Way - Aaron Blythe, DevOpsDays Chicago 2019

From The Phoenix Project there are three ways. This short talk is about the all important third way, which focuses on continual improvement and taking risks.

The Third Way is about creating a culture that fosters two things: continual experimentation, taking risks and learning from failure; and understanding that repetition and practice is the prerequisite to mastery. In operations we often focus on “keeping the lights on”.

We all have Technical Debt. Technical Debt has to be paid down. But which projects should we focus on? We have to start by identifying our constraints. By focusing on our constraints we can prioritize not only the work that we release, but also the technical debt projects that should be worked. These projects are a great place to introduce the idea of a safe place to fail and iterating based on whether the desired outcome is met.

91fc4cf4a51d1c2d5e3a2c881dadfc7e?s=128

DevOpsDays Chicago

August 28, 2019
Tweet

Transcript

  1. @ablythe 1. 2. 3. Flow Feedback

  2. @ablythe

  3. None
  4. @ablythe Learn Blame Choose One

  5. @ablythe Retributive Culture . Which rule was broken? How bad

    was the outcome? What should the consequences be?
  6. @ablythe Restorative Culture . Who is hurt? What are their

    needs? Whose obligation is it to meet those needs?
  7. @ablythe Sidney Dekker - Just Culture

  8. @ablythe Retributive Culture You pay or settle account Backward-looking accountability

    Who is responsible? Restorative Culture You tell account Forward-looking accountability What is responsible?
  9. @ablythe Make it safe to fail

  10. @ablythe Dev Ops (Business) (Customers) First Way: Flow (System Thinking)

  11. @ablythe Dev Ops (Business) (Customers) Second Way: Feedback

  12. @ablythe Dev Ops (Business) (Customers) Third Way: Culture of Continuous

    Learning
  13. @ablythe

  14. @ablythe “… massive outage… It was caused by, quite frankly,

    a dumb mistake. In fact by an engineer who had taken down Netflix twice in the last 18 months…”
  15. @ablythe “… in the same 18 months that engineer moved

    … <Netflix>… forward not by miles but by light years.”
  16. @ablythe What happens when it is not safe to fail?

    Hiding Secrecy Evasion Self-protecting Finger-pointing REPETITION of ERRORS
  17. @ablythe

  18. @ablythe Sharp End Blunt End Management/ Policy Individual Failure

  19. @ablythe “… building a high trust culture is likely the

    largest management challenge of this decade.” –Gene Kim
  20. None