Failing Well

@jasonrclark Failing Well http://bit.ly/failing-well

@jasonrclark Exceptions

⚠ Ruby Ahead ⚠ 3

4 begin raise StandardError.new("Oops") rescue # DO SOMETHING ensure #
Cleaning up end

Cleaning up end

6 def methods_are_awesome!(really = true) raise StandardError.new("Oops") rescue # DO
SOMETHING ensure # Cleaning up end

Cleaning up end

8 begin raise "Heck" # == RuntimeError.new("Heck") rescue # DO
SOMETHING ensure # Cleaning up end

Cleaning up end

Let's Get More Specific 11

12 begin raise StandardError.new("Oops") rescue StandardError => e puts e.message
ensure # Cleaning up end

13 begin raise StandardError.new("Oops") rescue Exception => e puts e.message
ensure # Cleaning up end

Ctrl+C => raise Interrupt.new 14

Interrupt > SignalException > Exception 15

Careful What You Catch! 16

Every line will # Absolutely, positively # Get executed! end

Nope! 18

Every line will # Absolutely, positively # Get executed! end

Thread#raise 20

Timeout#timeout 21

Rack::Timeout 22

Know What's Assured! 23

24 begin raise StandardError.new("Oops") rescue # DO SOMETHING ensure $!
# => Exception in flight end

25 begin raise StandardError.new("Oops") rescue # DO SOMETHING <= What
goes here? ensure # Cleaning up end

Record It! 26

@jasonrclark Alerting

What To Alert On? 28

Errors Availability Latency 29

31 Elasticsearch Cluster

32 Elasticsearch Cluster O_o

33 Elasticsearch Cluster O_o

Avoid Duplication 34

"Oh, that just happens. Ignore it." 35

36 Alert Fatigue

1. Problems you can do something about 37

ALERT! 38

2. Upstream problems you can't impact 39

WARN 40

3. Other odd stuff? 41

TRACK 42

TRUST! 43

LEARN! 44

@jasonrclark Gamedays

Practice Failing 46

... In Production 47

1. Identify Your Resources 48

Application instances Other services Datastores Caches Files CDN 49

2. What Can Go Wrong? 50

Missing Slow Errors Corrupt Data 51

Risk Matrix 52

Running the Gameday 53

Reality's best 54

Generate load 55

Break stuff! 56

kill <pid> docker stop <id> 57

kill -STOP <pid> 58

kill -CONT <pid> 59

iptables or tc 60

https://github.com/ tylertreat/Comcast 61

Test Recovery Too! 62

@jasonrclark Automating Failure

Toxiproxy 65 http://toxiproxy.io/

66 Your App Valuable Resource

67 Your App Valuable Resource Toxiproxy

68 Your App Valuable Resource HTTP commands! Toxiproxy

69 class MyApplication # ... config.elastic_url = ENV["ES_URL"] || "http://localhost:9200"
end

70 Toxiproxy.populate([ { name: "elastic_search", listen: "127.0.0.1:22220", upstream: "127.0.0.1:9200" }
])

])

74 Rails.configuration.elastic_url = "http://127.0.0.1:22220" # Boot toxiproxy if it isn't
running already mac = RbConfig::CONFIG["host_os"] =~ /darwin/ bin = mac ? "bin/toxiproxy-darwin-amd64" : "bin/toxiproxy-linux-amd64" pid = spawn(bin) Process.detach(pid)

75 Rails.configuration.elastic_url = "http://127.0.0.1:22220" # Boot toxiproxy if it isn't
running already mac = RbConfig::CONFIG["host_os"] =~ /darwin/ bin = mac ? "bin/toxiproxy-darwin-amd64" : "bin/toxiproxy-linux-amd64" pid = spawn(bin) Process.detach(pid)

76 it "returns a clean error when down" do Toxiproxy[:elastic_search].down
do post action, args assert_response 500 end end

77 it "returns a clean error on timeout" do Toxiproxy[:elastic_search].
downstream(:latency, latency: 900).apply do post action, args assert_response 500 end end

@jasonrclark Exceptions Alerting Gamedays Automating Failure ???

Failing Well

Failing Well

More Decks by Jason R Clark

Other Decks in Technology

Featured

Transcript