Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Fault Tolerant API with Hystrix

Building a Fault Tolerant API with Hystrix

The API for Bodybuilding.com serves more than a hundred million API calls everyday across hundreds of servers. Learn how we use Hystrix to build a distributed system that is both fault and latency tolerant. We will discuss the bulkhead and circuit breaker patterns used by Hystrix to provide a resilient and fast API.

Ryan Dearing

October 15, 2014
Tweet

Other Decks in Programming

Transcript

  1. Resiliency • Adapt automatically to unexpected failures • Maintain performance

    during failures • Prevent cascading failures • Fail-Fast, Fail Gracefully • Visibility Tardigrade
  2. Bulkheading isolates failures and latency “A ship’s hull is divided

    into different watertight bulkheads so that if the hull is compromised, the failure is limited to that bulkhead as opposed to taking the entire ship down. By partitioning your system, you can confine errors to one area as opposed to taking the entire system down.” - John Ragan
  3. Circuit Breaker Pattern Detects failures and prevents system from executing

    actions that are certain to fail. Retries periodically.
  4. Fail Fast If a dependency is failing or slow, reduce

    rate of requests. Prevents overloading the system.
  5. Hystrix • Timeout slow commands • Rate limit number of

    concurrent commands • Circuit Breaker • back off of failing dependencies • Attempt a fallback for failures • Real-Time dashboard for visibility
  6. public class CommandHelloWorld extends HystrixCommand<String> {! ! private final String

    name;! ! public CommandHelloWorld(String name) {! super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));! this.name = name;! }! ! @Override! protected String run() {! // a real example would do work like a network call here! return "Hello " + name + "!";! }! }!
  7. // this blocks! String s = new CommandHelloWorld("World").execute();! //this doesn't!

    Future<String> fs = new CommandHelloWorld("World").queue();! //this doesn't either! Observable<String> fs = new CommandHelloWorld("World").observe();!
  8. public class CommandHelloFailure extends HystrixCommand<String> {! ! private final String

    name;! ! public CommandHelloFailure(String name) {! super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));! this.name = name;! }! ! @Override! protected String run() {! throw new RuntimeException("this command always fails");! }! ! @Override! protected String getFallback() {! return "Hello Failure " + name + "!";! }! }!
  9. public class CommandThatFailsSilently extends HystrixCommand<String> {! ! private final boolean

    throwException;! ! public CommandThatFailsSilently(boolean throwException) {! super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));! this.throwException = throwException;! }! ! @Override! protected String run() {! if (throwException) {! throw new RuntimeException("failure from CommandThatFailsFast");! } else {! return "success";! }! }! ! @Override! protected String getFallback() {! return null;! }! }!
  10. public class CommandWithStubbedFallback extends HystrixCommand<UserAccount> {! ! private final int

    customerId;! private final String countryCodeFromGeoLookup;! ! protected CommandWithStubbedFallback(int customerId, String countryCodeFromGeoLookup) {! super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));! this.customerId = customerId;! this.countryCodeFromGeoLookup = countryCodeFromGeoLookup;! }! ! @Override! protected UserAccount run() {! // fetch UserAccount from remote service! // return UserAccountClient.getAccount(customerId);! throw new RuntimeException("forcing failure for example");! }! ! @Override! protected UserAccount getFallback() {! /**! * Return stubbed fallback with some static defaults, placeholders,! * and an injected value 'countryCodeFromGeoLookup' that we'll use! * instead of what we would have retrieved from the remote service.! */! return new UserAccount(customerId, "Unknown Name",! countryCodeFromGeoLookup, true, true, false);! }! }!
  11. Hystrix Dashboard System is backing off and adapting dynamically. Some

    calls still working, others timing out not blocking and cascading.
  12. Lessons Learned • Organize code for Hystrix from the start

    • Use Hystrix in API client libraries • Easier to configure if you have historical data about response times, request rate, etc • Command instances should not be reused • Fault vs Non Fault exceptions • Use Hystrix Dashboard