Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hystrix: Building blocks for Distributed Systems

Hystrix: Building blocks for Distributed Systems

Overview of Netflix's Hystrix as a basic tutorial session, presented as a FullContact Tech Talk.

6df0f0cdde29041510e787ac49f0e930?s=128

Michael Rose

August 08, 2013
Tweet

Transcript

  1. Hystrix Building blocks for Distributed Systems August 8th, 2013 Thursday,

    August 8, 13
  2. It’s from Netflix, does it watch movies? What does it

    do? Thursday, August 8, 13
  3. It’s a library for building resilient SOA services Not exactly.

    Thursday, August 8, 13
  4. Hystrix Goals Thursday, August 8, 13

  5. Stop cascading failures. Fallbacks and graceful degradation. Fail fast and

    rapid recovery. Thread and semaphore isolation with circuit breakers. Latency and Fault Tolerance Thursday, August 8, 13
  6. Realtime monitoring and configuration changes. Watch service and property changes

    take effect immediately as they spread across a fleet. Be alerted, make decisions, affect change and see results in seconds. Real-time Operations Thursday, August 8, 13
  7. Parallel execution. Concurrency aware request caching. Automated batching through request

    collapsing. Concurrency Thursday, August 8, 13
  8. Commands are simple Thursday, August 8, 13

  9. Running is easy Synchronous Asynchronous RxJava Thursday, August 8, 13

  10. .queue() gives you a future. You must block on it

    for Hystrix to time out a command. Thursday, August 8, 13
  11. Wrap dependencies Thursday, August 8, 13

  12. Thread pools will fill and you’ll reject work. Circuits will

    open shortly thereafter. This is good. When they start to become latent, you’ll insulate yourself Thursday, August 8, 13
  13. By default, Hystrix isolates by pushing commands at thread pools.

    There is also semaphore commands for in-memory caches and such. Isolation Thursday, August 8, 13
  14. You’ll stop hammering the resource and let it recover, and

    your consumers won’t sit around waiting for a timeout. Thursday, August 8, 13
  15. If the database is down, you could check memcached. Or

    return a generic response. You can add fallbacks Thursday, August 8, 13
  16. In the FullContact AddressBook we use ElasticSearch to display contact

    lists. If ElasticSearch is dead, we’re down. If we isolated with Hystrix, we could disable search functionality and still allow basic browsing of contacts directly from MySQL. Concrete example: Thursday, August 8, 13
  17. Propagate the cause. It’s helpful to check the .getFailureType() of

    HystrixRuntimeException. Apache ExceptionUtils can find the cause easily. There isn’t always a fallback. Especially for writes. Thursday, August 8, 13
  18. Network-based fallbacsk need their own Hystrix Commands Thursday, August 8,

    13
  19. RxJava • Reactive Extensions • Kind of like push-based iterators

    • All kinds of cool features for another tech talk • Integrated into Hystrix as of 1.3.0 Thursday, August 8, 13
  20. Gotchas • Doesn’t work well with Groovy • Groovy can

    call a command, and the command can call Groovy code, but has issues being the actual command • Configuration syntax is awkward • Underlying I/O calls need timeouts. I can’t stress this enough, otherwise you’ll fill your threadpools with stuck threads until the sockets return (which may be never). HTTP libraries should be configured to timeout well before the Hystrix timeout (1000ms by default) hits with time for a retry. e.x. for a call with a 1000ms Hystrix timeout and 3 retries, make the timeout 250ms. • Hystrix timeouts are done with thread interrupts. If the thread can’t interrupt, it’ll exhaust your threadpool and reject work until it clears up. Thursday, August 8, 13
  21. Thursday, August 8, 13

  22. Thursday, August 8, 13

  23. Thursday, August 8, 13

  24. Thursday, August 8, 13

  25. Include in the hystrix-metrics-event-stream package. Provides a servlet you can

    mount. Exports to dashboard or Turbine (aggregation service) Metrics Thursday, August 8, 13
  26. By default sorts by Error then volume. You can see

    below we have some command failures on FindOneCommand Thursday, August 8, 13
  27. Make a façade Migrating a Library Example Sherlock, look at

    the HystrixHBaseCacheClient Thursday, August 8, 13
  28. Curator service discovery Clustering Thursday, August 8, 13

  29. Request Batching & Request Caching Nifty features Thursday, August 8,

    13
  30. Caching Thursday, August 8, 13

  31. Implement a HystrixCollapser, then takes a configurable (default: 10ms) window

    and batches request to the service and then maps responses onto requests. Request Batching Thursday, August 8, 13
  32. Get this in yo’ app! If you’re using blocking IO

    (and with Observables, even non- blocking) JFDI! (Will let you do non- blocking joins) Best Practices If possible, use Archaius. Makes it super easy to configure commands (syntax is awkward otherwise) https://github.com/Netflix/Hystrix/wiki/How-To- Use#wiki-Common-Patterns Thursday, August 8, 13
  33. #hystrix hystrix.command.GetEnrichedContactCommand.execution.isolation.thread.timeoutInMilliseconds = 60000 hystrix.command.MergeStringContactsCommand.execution.isolation.thread.timeoutInMilliseconds = 60000 hystrix.command.GetFromCacheCommand.execution.isolation.thread.timeoutInMilliseconds = 5000

    hystrix.command.PutIntoCacheCommand.execution.isolation.thread.timeoutInMilliseconds = 5000 hystrix.command.NameApiCommand.execution.isolation.thread.timeoutInMilliseconds = 2000 hystrix.threadpool.IdentibaseThrift.coreSize = 20 hystrix.threadpool.NameAPI.coreSize = 20 hystrix.threadpool.HBase.coreSize = 50 hystrix.threadpool.MongoDB.coreSize = 30 Thursday, August 8, 13
  34. Configuring timeouts Thursday, August 8, 13

  35. Questions? Thursday, August 8, 13