Slide 1

Slide 1 text

Hystrix Building blocks for Distributed Systems August 8th, 2013 Thursday, August 8, 13

Slide 2

Slide 2 text

It’s from Netflix, does it watch movies? What does it do? Thursday, August 8, 13

Slide 3

Slide 3 text

It’s a library for building resilient SOA services Not exactly. Thursday, August 8, 13

Slide 4

Slide 4 text

Hystrix Goals Thursday, August 8, 13

Slide 5

Slide 5 text

Stop cascading failures. Fallbacks and graceful degradation. Fail fast and rapid recovery. Thread and semaphore isolation with circuit breakers. Latency and Fault Tolerance Thursday, August 8, 13

Slide 6

Slide 6 text

Realtime monitoring and configuration changes. Watch service and property changes take effect immediately as they spread across a fleet. Be alerted, make decisions, affect change and see results in seconds. Real-time Operations Thursday, August 8, 13

Slide 7

Slide 7 text

Parallel execution. Concurrency aware request caching. Automated batching through request collapsing. Concurrency Thursday, August 8, 13

Slide 8

Slide 8 text

Commands are simple Thursday, August 8, 13

Slide 9

Slide 9 text

Running is easy Synchronous Asynchronous RxJava Thursday, August 8, 13

Slide 10

Slide 10 text

.queue() gives you a future. You must block on it for Hystrix to time out a command. Thursday, August 8, 13

Slide 11

Slide 11 text

Wrap dependencies Thursday, August 8, 13

Slide 12

Slide 12 text

Thread pools will fill and you’ll reject work. Circuits will open shortly thereafter. This is good. When they start to become latent, you’ll insulate yourself Thursday, August 8, 13

Slide 13

Slide 13 text

By default, Hystrix isolates by pushing commands at thread pools. There is also semaphore commands for in-memory caches and such. Isolation Thursday, August 8, 13

Slide 14

Slide 14 text

You’ll stop hammering the resource and let it recover, and your consumers won’t sit around waiting for a timeout. Thursday, August 8, 13

Slide 15

Slide 15 text

If the database is down, you could check memcached. Or return a generic response. You can add fallbacks Thursday, August 8, 13

Slide 16

Slide 16 text

In the FullContact AddressBook we use ElasticSearch to display contact lists. If ElasticSearch is dead, we’re down. If we isolated with Hystrix, we could disable search functionality and still allow basic browsing of contacts directly from MySQL. Concrete example: Thursday, August 8, 13

Slide 17

Slide 17 text

Propagate the cause. It’s helpful to check the .getFailureType() of HystrixRuntimeException. Apache ExceptionUtils can find the cause easily. There isn’t always a fallback. Especially for writes. Thursday, August 8, 13

Slide 18

Slide 18 text

Network-based fallbacsk need their own Hystrix Commands Thursday, August 8, 13

Slide 19

Slide 19 text

RxJava • Reactive Extensions • Kind of like push-based iterators • All kinds of cool features for another tech talk • Integrated into Hystrix as of 1.3.0 Thursday, August 8, 13

Slide 20

Slide 20 text

Gotchas • Doesn’t work well with Groovy • Groovy can call a command, and the command can call Groovy code, but has issues being the actual command • Configuration syntax is awkward • Underlying I/O calls need timeouts. I can’t stress this enough, otherwise you’ll fill your threadpools with stuck threads until the sockets return (which may be never). HTTP libraries should be configured to timeout well before the Hystrix timeout (1000ms by default) hits with time for a retry. e.x. for a call with a 1000ms Hystrix timeout and 3 retries, make the timeout 250ms. • Hystrix timeouts are done with thread interrupts. If the thread can’t interrupt, it’ll exhaust your threadpool and reject work until it clears up. Thursday, August 8, 13

Slide 21

Slide 21 text

Thursday, August 8, 13

Slide 22

Slide 22 text

Thursday, August 8, 13

Slide 23

Slide 23 text

Thursday, August 8, 13

Slide 24

Slide 24 text

Thursday, August 8, 13

Slide 25

Slide 25 text

Include in the hystrix-metrics-event-stream package. Provides a servlet you can mount. Exports to dashboard or Turbine (aggregation service) Metrics Thursday, August 8, 13

Slide 26

Slide 26 text

By default sorts by Error then volume. You can see below we have some command failures on FindOneCommand Thursday, August 8, 13

Slide 27

Slide 27 text

Make a façade Migrating a Library Example Sherlock, look at the HystrixHBaseCacheClient Thursday, August 8, 13

Slide 28

Slide 28 text

Curator service discovery Clustering Thursday, August 8, 13

Slide 29

Slide 29 text

Request Batching & Request Caching Nifty features Thursday, August 8, 13

Slide 30

Slide 30 text

Caching Thursday, August 8, 13

Slide 31

Slide 31 text

Implement a HystrixCollapser, then takes a configurable (default: 10ms) window and batches request to the service and then maps responses onto requests. Request Batching Thursday, August 8, 13

Slide 32

Slide 32 text

Get this in yo’ app! If you’re using blocking IO (and with Observables, even non- blocking) JFDI! (Will let you do non- blocking joins) Best Practices If possible, use Archaius. Makes it super easy to configure commands (syntax is awkward otherwise) https://github.com/Netflix/Hystrix/wiki/How-To- Use#wiki-Common-Patterns Thursday, August 8, 13

Slide 33

Slide 33 text

#hystrix hystrix.command.GetEnrichedContactCommand.execution.isolation.thread.timeoutInMilliseconds = 60000 hystrix.command.MergeStringContactsCommand.execution.isolation.thread.timeoutInMilliseconds = 60000 hystrix.command.GetFromCacheCommand.execution.isolation.thread.timeoutInMilliseconds = 5000 hystrix.command.PutIntoCacheCommand.execution.isolation.thread.timeoutInMilliseconds = 5000 hystrix.command.NameApiCommand.execution.isolation.thread.timeoutInMilliseconds = 2000 hystrix.threadpool.IdentibaseThrift.coreSize = 20 hystrix.threadpool.NameAPI.coreSize = 20 hystrix.threadpool.HBase.coreSize = 50 hystrix.threadpool.MongoDB.coreSize = 30 Thursday, August 8, 13

Slide 34

Slide 34 text

Configuring timeouts Thursday, August 8, 13

Slide 35

Slide 35 text

Questions? Thursday, August 8, 13