Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Failing Well

Failing Well

It's a fact of life--software breaks.

But all is not doom and gloom. How we detect and handle errors drastically impacts the quality of both our systems and our lives. Knowing what to track, when to page, and how to find system weaknesses is critical.

You’ll leave this talk with tactics for coping with failures on multiple levels. We'll see how error handling and alerting ground a reliable system. Then we'll automate testing and finally induce problems in live, running code to see where our expectations and reality diverge.

Failure is inevitable, but that doesn't mean you can't fail well!

Jason R Clark

June 23, 2017
Tweet

More Decks by Jason R Clark

Other Decks in Technology

Transcript

  1. 17 begin raise StandardError.new("Oops") rescue # DO SOMETHING ensure #

    Every line will # Absolutely, positively # Get executed! end
  2. 19 begin raise StandardError.new("Oops") rescue # DO SOMETHING ensure #

    Every line will # Absolutely, positively # Get executed! end
  3. 30

  4. 64

  5. 74 Rails.configuration.elastic_url = "http://127.0.0.1:22220" # Boot toxiproxy if it isn't

    running already mac = RbConfig::CONFIG["host_os"] =~ /darwin/ bin = mac ? "bin/toxiproxy-darwin-amd64" : "bin/toxiproxy-linux-amd64" pid = spawn(bin) Process.detach(pid)
  6. 75 Rails.configuration.elastic_url = "http://127.0.0.1:22220" # Boot toxiproxy if it isn't

    running already mac = RbConfig::CONFIG["host_os"] =~ /darwin/ bin = mac ? "bin/toxiproxy-darwin-amd64" : "bin/toxiproxy-linux-amd64" pid = spawn(bin) Process.detach(pid)
  7. 76 it "returns a clean error when down" do Toxiproxy[:elastic_search].down

    do post action, args assert_response 500 end end
  8. 77 it "returns a clean error on timeout" do Toxiproxy[:elastic_search].

    downstream(:latency, latency: 900).apply do post action, args assert_response 500 end end