as Circuit Breaker kept our infrastructure up and running during a 20x spike on prime time. • Brief introduction to microservices • Resilience? • Stability patterns • Circuit Breakers: here it is!
a single application as a suite of small services, each running in its own process and communicating via lightweight mechanisms, often an HTTP resource API. Martins Fowler MICROSERVICES ARE A DISTRIBUTED SYSTEM
one in which the failure of a computer you didn't even know existed can render your own computer unusable. Leslie Lamport DON’T TRY TO AVOID FAILURES, EMBRACE FAILURES.
resume its original size and shape after a deformation. 2. Ability of an equipment, machine, or system to absorb the impact of the failure of one or more components or a significant disturbance in its environment, and to still continue to provide an acceptable level of service. http://www.businessdictionary.com/definition/resilience.html#ixzz4GRYjjW6y INTRODUCING A (NOT THAT) NEW CONCEPT YOUR DESTINY IS TO FAIL, SO FAIL WITH STYLE.
for the resource to be ready, otherwise time out the connection. Retry (as required). >>> import requests >>> from requests.adapters import HTTPAdapter >>> s.mount('http://', HTTPAdapter(max_retries=2)) >>> s.get(‘http://es.s.edo.so/search?q=PythonExpose', timeout=1) (SOME OF) STABILITY PATTERNS A thousand new connections are handled and degraded gracefully.
deal with that. Just let the server knowing you want to fail fast. >>> import requests >>> from requests.adapters import HTTPAdapter) >>> headers = {‘X-Client-Timeout‘: '250'} >>> s.mount('http://', HTTPAdapter(max_retries=2)) >>> s.get(‘http://es.s.edo.so/search?q=PythonExpose', \ timeout=0.25, headers=headers) Two thousands new connections are handled and degraded gracefully.
entire ship down, divide the ship into different watertight bulkheads and let them fail separately. >>> import requests >>> from requests.adapters import HTTPAdapter) >>> headers = {‘X-Client-Timeout‘: '250'} >>> s.mount('http://', HTTPAdapter(max_retries=2)) >>> s.get(‘http://read.es.s.edo.so/search?q=PythonExpose', \ timeout=1, headers=headers) A strategy can be separate the read from the write, in order to isolate the system from massive READ operations. Three thousands new connections are handled and degraded gracefully.
TOLD YOU BEFORE, FAIL WITH STYLE. JUST SET ES UNREACHABLE Users are coming too fast, they are using the search like crazy but Elasticsearch is not responding. Five thousands new connections are timed out.
HEALTHY or “closed” UNHEALTHY or “open” RESETTING or “half-open” SET A STATE TO YOUR DEPENDENCY PERFORM the operation BLOCK immediately and return TRY TO PERFORM or fail NO MORE REQUESTS IF THE SERVER IS NOT RESPONDING. NO MORE LOAD ON THE BACKEND, RECOVERY IS USUALLY FASTER.
advance about the health of the system. • You can build features on top of this. • You can trigger actions based on top of this. • You can create new logics on top of this. • You can escalate better on top of this.
SYSTEM INSTEAD OF THE UNAVAILABLE ONE. LIKE REDIRECT USERS TO ANOTHER HEALTHY SYSTEM AND LET THEM CONTINUE THE TOUR. LIKE TRIGGERING A DATA INTEGRITY CHECKUP AFTER THE SYSTEM HAS RECOVERED. LIKE STORE ALL THE INFORMATION IN A LOG FILE AS THE DATABASE IS OUT OF SERVICE. LIKE INFORM YOUR TECH GUY EVERY TIME A CIRCUIT IS OPEN. • You can know in advance about the healthiness of the system. • You can build features on top of this. • You can trigger actions based on top of this. • You can create new logics on top of this. • You can escalate better on top of this. NEW POSSIBILITIES