Hystrix: Building blocks for Distributed Systems

Hystrix Building blocks for Distributed Systems August 8th, 2013 Thursday,
August 8, 13

It’s from Netflix, does it watch movies? What does it
do? Thursday, August 8, 13

It’s a library for building resilient SOA services Not exactly.
Thursday, August 8, 13

Hystrix Goals Thursday, August 8, 13

Stop cascading failures. Fallbacks and graceful degradation. Fail fast and
rapid recovery. Thread and semaphore isolation with circuit breakers. Latency and Fault Tolerance Thursday, August 8, 13

Realtime monitoring and conﬁguration changes. Watch service and property changes
take effect immediately as they spread across a fleet. Be alerted, make decisions, affect change and see results in seconds. Real-time Operations Thursday, August 8, 13

Parallel execution. Concurrency aware request caching. Automated batching through request
collapsing. Concurrency Thursday, August 8, 13

Commands are simple Thursday, August 8, 13

Running is easy Synchronous Asynchronous RxJava Thursday, August 8, 13

.queue() gives you a future. You must block on it
for Hystrix to time out a command. Thursday, August 8, 13

Wrap dependencies Thursday, August 8, 13

Thread pools will fill and you’ll reject work. Circuits will
open shortly thereafter. This is good. When they start to become latent, you’ll insulate yourself Thursday, August 8, 13

By default, Hystrix isolates by pushing commands at thread pools.
There is also semaphore commands for in-memory caches and such. Isolation Thursday, August 8, 13

You’ll stop hammering the resource and let it recover, and
your consumers won’t sit around waiting for a timeout. Thursday, August 8, 13

If the database is down, you could check memcached. Or
return a generic response. You can add fallbacks Thursday, August 8, 13

In the FullContact AddressBook we use ElasticSearch to display contact
lists. If ElasticSearch is dead, we’re down. If we isolated with Hystrix, we could disable search functionality and still allow basic browsing of contacts directly from MySQL. Concrete example: Thursday, August 8, 13

Propagate the cause. It’s helpful to check the .getFailureType() of
HystrixRuntimeException. Apache ExceptionUtils can find the cause easily. There isn’t always a fallback. Especially for writes. Thursday, August 8, 13

Network-based fallbacsk need their own Hystrix Commands Thursday, August 8,
13

RxJava • Reactive Extensions • Kind of like push-based iterators
• All kinds of cool features for another tech talk • Integrated into Hystrix as of 1.3.0 Thursday, August 8, 13

Gotchas • Doesn’t work well with Groovy • Groovy can
call a command, and the command can call Groovy code, but has issues being the actual command • Configuration syntax is awkward • Underlying I/O calls need timeouts. I can’t stress this enough, otherwise you’ll fill your threadpools with stuck threads until the sockets return (which may be never). HTTP libraries should be configured to timeout well before the Hystrix timeout (1000ms by default) hits with time for a retry. e.x. for a call with a 1000ms Hystrix timeout and 3 retries, make the timeout 250ms. • Hystrix timeouts are done with thread interrupts. If the thread can’t interrupt, it’ll exhaust your threadpool and reject work until it clears up. Thursday, August 8, 13

Thursday, August 8, 13

Include in the hystrix-metrics-event-stream package. Provides a servlet you can
mount. Exports to dashboard or Turbine (aggregation service) Metrics Thursday, August 8, 13

By default sorts by Error then volume. You can see
below we have some command failures on FindOneCommand Thursday, August 8, 13

Make a façade Migrating a Library Example Sherlock, look at
the HystrixHBaseCacheClient Thursday, August 8, 13

Curator service discovery Clustering Thursday, August 8, 13

Request Batching & Request Caching Nifty features Thursday, August 8,
13

Caching Thursday, August 8, 13

Implement a HystrixCollapser, then takes a configurable (default: 10ms) window
and batches request to the service and then maps responses onto requests. Request Batching Thursday, August 8, 13

Get this in yo’ app! If you’re using blocking IO
(and with Observables, even non- blocking) JFDI! (Will let you do non- blocking joins) Best Practices If possible, use Archaius. Makes it super easy to configure commands (syntax is awkward otherwise) https://github.com/Netflix/Hystrix/wiki/How-To- Use#wiki-Common-Patterns Thursday, August 8, 13

#hystrix hystrix.command.GetEnrichedContactCommand.execution.isolation.thread.timeoutInMilliseconds = 60000 hystrix.command.MergeStringContactsCommand.execution.isolation.thread.timeoutInMilliseconds = 60000 hystrix.command.GetFromCacheCommand.execution.isolation.thread.timeoutInMilliseconds = 5000
hystrix.command.PutIntoCacheCommand.execution.isolation.thread.timeoutInMilliseconds = 5000 hystrix.command.NameApiCommand.execution.isolation.thread.timeoutInMilliseconds = 2000 hystrix.threadpool.IdentibaseThrift.coreSize = 20 hystrix.threadpool.NameAPI.coreSize = 20 hystrix.threadpool.HBase.coreSize = 50 hystrix.threadpool.MongoDB.coreSize = 30 Thursday, August 8, 13

Configuring timeouts Thursday, August 8, 13

Questions? Thursday, August 8, 13

Hystrix: Building blocks for Distributed Systems

Hystrix: Building blocks for Distributed Systems

More Decks by Michael Rose

Other Decks in Programming

Featured

Transcript