Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How Not to Go Boom: Lessons for SREs from Oil Refineries

How Not to Go Boom: Lessons for SREs from Oil Refineries

Bad software doesn’t explode. You can describe it as exploding when it throws an exception, corrupts some data, or makes your computer unusable, but it doesn’t explode. When code doesn’t work, the solution is to figure out where the logic is incorrect and fix it. While SREs may be called engineers, we rarely face the consequences of engineers in other industries.

In contrast, when a chemical engineer makes a mistake designing a refinery, the consequences are very different. We’ve all seen videos of the repercussions online. Big, loud explosions reducing massive facilities to chunks of twisted metal. The reality is working with unstable chemicals is a lot harder than keeping track of pointers in C.

Yet despite the differences, industrial process plants can be surprisingly similar to a complex software system. Where refineries will use pressure relief valves, web services will degrade gracefully. Regardless if you’re protecting against thermal runaway in a plant or a cascading failure in a data center, the fundamental ideas can be shared by both domains.

In this talk, I’ll explore the techniques and ideas used to build and operate refineries and how we can use them to make our software systems more resilient and reliable.

Emil Stolarsky

March 29, 2018
Tweet

More Decks by Emil Stolarsky

Other Decks in Programming

Transcript

  1. How Not to Go Boom
    Lessons for SREs from Oil Refineries
    Emil Stolarsky | @EmilStolarsky

    View Slide

  2. View Slide

  3. Resiliency

    View Slide

  4. View Slide

  5. - Craig Fugate, Director of FEMA (2009 –2017)
    “If you get there and the Waffle
    House is closed? That's really
    bad.”

    View Slide

  6. Oil Refineries

    View Slide

  7. Design for failure

    View Slide

  8. Explosion Isolation
    Systems

    View Slide

  9. Pressure Relief Systems

    View Slide

  10. Safe and Rapid Isolation
    of Piping Systems

    View Slide

  11. - Trevor Kletz, Chemical Process Safety Expert
    “If you think safety is expensive,
    try having an accident.”

    View Slide

  12. Fault Tree Analysis

    View Slide

  13. View Slide

  14. A
    B C D
    E

    View Slide

  15. A
    B C D
    E

    View Slide

  16. A
    B C D
    E

    View Slide

  17. A
    B C D
    E

    View Slide

  18. A
    B C D
    E

    View Slide

  19. A
    B C D
    E

    View Slide

  20. A
    B C D
    E

    View Slide

  21. A
    B C D
    E

    View Slide

  22. A
    B C D
    E
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%

    View Slide

  23. A
    B C D
    E p(E)= 2%·2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    p(C)= 2%·2%·2%
    p(B)= 2%+2%

    View Slide

  24. A
    B C D
    E p(E)= 2%·2%
    p(D)= 2%+p(E)
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    p(A)= p(B) + p(C) + p(D)
    p(C)= 2%·2%·2%
    p(B)= 2%+2%

    View Slide

  25. A
    B C D
    E p(E)= 0.04%
    p(D)= 2%+p(E)
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    p(A)= p(B) + p(C) + p(D)
    p(C)= 0.0008%
    p(B)= 4%

    View Slide

  26. A
    B C D
    E p(E)= 0.04%
    p(D)= 2.04%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    p(A)= p(B) + p(C) + p(D)
    p(C)= 0.0008%
    p(B)= 4%

    View Slide

  27. A
    B C D
    E p(E)= 0.04%
    p(D)= 2.04%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    p(A)=6.0408%
    p(C)= 0.0008%
    p(B)= 4%

    View Slide

  28. A
    B C D
    E p(E)= 0.04%
    p(D)= ??%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    p(A)=??%
    p(C)= 0.0008%
    p(B)= 4%

    View Slide

  29. A
    B C D
    E p(E)= 0.04%
    p(D)= 2%·0.04%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    p(A)=??%
    p(C)= 0.0008%
    p(B)= 4%

    View Slide

  30. A
    B C D
    E p(E)= 0.04%
    p(D)= 0.0008%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    p(A)=??%
    p(C)= 0.0008%
    p(B)= 4%

    View Slide

  31. A
    B C D
    E p(E)= 0.04%
    p(D)= 0.0008%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    p(A)=4.0016%
    p(C)= 0.0008%
    p(B)= 4%

    View Slide

  32. A
    B C D
    E p(E)= 0.04%
    p(D)= 0.0008%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    2%
    p(A)=4.0016%
    p(C)= 0.0008%
    p(B)= 4%
    p(A)=6.0408%

    View Slide

  33. Learning from
    Failure

    View Slide

  34. View Slide

  35. Center for Chemical
    Process Safety

    View Slide

  36. U.S. Chemical Safety and
    Hazard Investigation Board

    View Slide

  37. View Slide

  38. Steam Boilers

    View Slide

  39. Thank you.

    View Slide