Resilience as a Continuous Delivery Enabler

Resilience as a Continuous Delivery Enabler

The traditional IT reliability strategy is robustness - optimise for MTBF (mean time between failures) by maintaining a failure-free production environment. This assumes failures are preventable, and depends on risk management theatre such as end-to-end testing. It leaves organisations trapped in discontinuous delivery, and in such circumstances a continuous delivery programme focused on throughput is unlikely to succeed.

A more effective reliability strategy in most scenarios is resilience - optimise for MTTR (mean time to repair) by rapidly responding to production failures. This assumes failures are inevitable, and depends on the adaptive capacity of teams and services being increased with operability practices.

Resilience is also a great enabler of continuous delivery. A continuous delivery programme focused on resilience will increase stakeholder confidence, and lay the groundwork for challenging robustness risk management theatre to increase throughput.

During this talk, Steve Smith will explain why discontinuous delivery is part of the tradition of optimising for MTBF, and how optimising for MTTR can power continuous delivery adoption. This is an overview of a new approach to continuous delivery, backed by examples from private and public sector organisations.

The key learnings for participants are:

1. optimising for MTBF is an antiquated, flawed approach to IT reliability that results in long-term discontinuous delivery
2. if an organisation has optimised for MTBF, a continuous delivery programme focused on throughput is likely to fail
3. optimising for MTTR is a superior reliability strategy that advocates graceful extensibility to limit the impact of failures
4. resilience as a continuous delivery enabler is a heuristic that advocates resilience as the focus of a continuous delivery programme
5. improving the resilience of services by an order of magnitude makes it easier to offer practical alternatives to robustness risk management theatre


Steve Smith

May 31, 2018


  1. Resilience as a Continuous Delivery Enabler Steve Smith @SteveSmithCD

  2. Steve Smith Continuous Delivery consultant and trainer Author of “Measuring

    Continuous Delivery” b: e: t: @SteveSmithCD Associated with
  3. The Challenge Achieve a delivery speed fast enough to satisfy

    business demand Achieve sufficient reliability to protect daily business operations
  4. What is Reliability? “Reliability is the probability that a system

    will perform a required function without failure under stated conditions for a stated period of time” Patrick O’Connor and Andre Kleyner - Practical Reliability Engineering Reliable IT services keep an organisation running Without reliability, Continuous Delivery is worthless
  5. What is Reliability?

  6. Optimise For Robustness

  7. Optimise For Robustness

  8. Optimise For Robustness

  9. Optimise For Robustness

  10. Optimise For Robustness

  11. Optimise For Robustness

  12. Optimise For Robustness

  13. Optimise For Robustness

  14. Optimise For Robustness

  15. Optimise For Robustness

  16. Optimise For Robustness

  17. Optimise For Robustness

  18. Optimise For Robustness

  19. Optimise For Robustness “The complexity of these systems makes it

    impossible for them to run without multiple flaws being present” Richard Cook - How Complex Systems Fail A production environment is a complex system A production environment is always near failure
  20. Optimise For Robustness

  21. Optimise For Robustness

  22. Optimise For Resilience

  23. Optimise For Resilience “Graceful extensibility is the ability of a

    system to extend its capacity to adapt when surprise events challenge its boundaries” David Woods - Four Concepts for Resilience Graceful extensibility comes from adaptive capacity Sources of adaptive capacity must be created Graceful extensibility leads to sustained adaptability
  24. Optimise For Resilience

  25. Optimise For Resilience

  26. Optimise For Resilience

  27. Optimise For Resilience

  28. Optimise For Resilience

  29. Optimise For Resilience

  30. Optimise For Resilience

  31. Optimise For Resilience

  32. Optimise For Resilience

  33. Optimise For Resilience

  34. Optimise For Resilience

  35. Optimise For Resilience

  36. Optimise For Resilience

  37. Optimise For Resilience

  38. Optimise For Resilience

  39. Optimise For Resilience

  40. Summary “Resilience and the ability to innovate... are essential” Dr

    Nicole Forsgren, Jez Humble, and Gene Kim - Accelerate Optimising For Robustness is a flawed strategy Optimising For Resilience is a superior approach, and is a great foundation for Continuous Delivery
  41. Thanks