Slide 1

Slide 1 text

Building a Fault Tolerant API with Hystrix Ryan Dearing Engineering Manager Bodybuilding.com

Slide 2

Slide 2 text

100 million API requests every day

Slide 3

Slide 3 text

hundreds of servers

Slide 4

Slide 4 text

hundreds of dependencies

Slide 5

Slide 5 text

Failures Will Happen!

Slide 6

Slide 6 text

Resiliency • Adapt automatically to unexpected failures • Maintain performance during failures • Prevent cascading failures • Fail-Fast, Fail Gracefully • Visibility Tardigrade

Slide 7

Slide 7 text

Bulkheading isolates failures and latency “A ship’s hull is divided into different watertight bulkheads so that if the hull is compromised, the failure is limited to that bulkhead as opposed to taking the entire ship down. By partitioning your system, you can confine errors to one area as opposed to taking the entire system down.” - John Ragan

Slide 8

Slide 8 text

Prevent failures from cascading “Domino Effect”

Slide 9

Slide 9 text

Circuit Breaker Pattern Detects failures and prevents system from executing actions that are certain to fail. Retries periodically.

Slide 10

Slide 10 text

Fail Fast If a dependency is failing or slow, reduce rate of requests. Prevents overloading the system.

Slide 11

Slide 11 text

Hystrix • Timeout slow commands • Rate limit number of concurrent commands • Circuit Breaker • back off of failing dependencies • Attempt a fallback for failures • Real-Time dashboard for visibility

Slide 12

Slide 12 text

Hystrix Flowchart

Slide 13

Slide 13 text

Hystrix Flowchart

Slide 14

Slide 14 text

Hystrix Flowchart

Slide 15

Slide 15 text

Hystrix Flowchart `

Slide 16

Slide 16 text

Hystrix Flowchart

Slide 17

Slide 17 text

public class CommandHelloWorld extends HystrixCommand {! ! private final String name;! ! public CommandHelloWorld(String name) {! super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));! this.name = name;! }! ! @Override! protected String run() {! // a real example would do work like a network call here! return "Hello " + name + "!";! }! }!

Slide 18

Slide 18 text

// this blocks! String s = new CommandHelloWorld("World").execute();! //this doesn't! Future fs = new CommandHelloWorld("World").queue();! //this doesn't either! Observable fs = new CommandHelloWorld("World").observe();!

Slide 19

Slide 19 text

public class CommandHelloFailure extends HystrixCommand {! ! private final String name;! ! public CommandHelloFailure(String name) {! super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));! this.name = name;! }! ! @Override! protected String run() {! throw new RuntimeException("this command always fails");! }! ! @Override! protected String getFallback() {! return "Hello Failure " + name + "!";! }! }!

Slide 20

Slide 20 text

public class CommandThatFailsSilently extends HystrixCommand {! ! private final boolean throwException;! ! public CommandThatFailsSilently(boolean throwException) {! super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));! this.throwException = throwException;! }! ! @Override! protected String run() {! if (throwException) {! throw new RuntimeException("failure from CommandThatFailsFast");! } else {! return "success";! }! }! ! @Override! protected String getFallback() {! return null;! }! }!

Slide 21

Slide 21 text

public class CommandWithStubbedFallback extends HystrixCommand {! ! private final int customerId;! private final String countryCodeFromGeoLookup;! ! protected CommandWithStubbedFallback(int customerId, String countryCodeFromGeoLookup) {! super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));! this.customerId = customerId;! this.countryCodeFromGeoLookup = countryCodeFromGeoLookup;! }! ! @Override! protected UserAccount run() {! // fetch UserAccount from remote service! // return UserAccountClient.getAccount(customerId);! throw new RuntimeException("forcing failure for example");! }! ! @Override! protected UserAccount getFallback() {! /**! * Return stubbed fallback with some static defaults, placeholders,! * and an injected value 'countryCodeFromGeoLookup' that we'll use! * instead of what we would have retrieved from the remote service.! */! return new UserAccount(customerId, "Unknown Name",! countryCodeFromGeoLookup, true, true, false);! }! }!

Slide 22

Slide 22 text

Request Collapsing

Slide 23

Slide 23 text

Hystrix Dashboard Especially useful if we have real-time insight into dependencies and failures

Slide 24

Slide 24 text

Hystrix Dashboard

Slide 25

Slide 25 text

Hystrix Dashboard Especially useful if we have real-time insight into dependencies and failures

Slide 26

Slide 26 text

Hystrix Dashboard System is backing off and adapting dynamically. Some calls still working, others timing out not blocking and cascading.

Slide 27

Slide 27 text

Lessons Learned • Organize code for Hystrix from the start • Use Hystrix in API client libraries • Easier to configure if you have historical data about response times, request rate, etc • Command instances should not be reused • Fault vs Non Fault exceptions • Use Hystrix Dashboard

Slide 28

Slide 28 text

Thanks • Hystrix: https://github.com/Netflix/Hystrix • Hystrix Dashboard: https://github.com/Netflix/ Hystrix/tree/master/hystrix-dashboard • Slides: https://speakerdeck.com/ryandearing/ building-a-fault-tolerant-api-with-hystrix