Slide 1

Slide 1 text

Circuit Breakers Improving microservices Edoardo Biraghi

Slide 2

Slide 2 text

…AND WHAT THIS TALK IS ABOUT? How a simple pattern as Circuit Breaker kept our infrastructure up and running during a 20x spike on prime time. • Brief introduction to microservices • Resilience? • Stability patterns • Circuit Breakers: here it is!

Slide 3

Slide 3 text

MICROSERVICES The microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating via lightweight mechanisms, often an HTTP resource API. Martins Fowler MICROSERVICES ARE A DISTRIBUTED SYSTEM

Slide 4

Slide 4 text

IF MICROSERVICES == DISTRIBUTED SYSTEM; THEN A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. Leslie Lamport DON’T TRY TO AVOID FAILURES, EMBRACE FAILURES.

Slide 5

Slide 5 text

re•sil•ience (rɪˈzɪl yəns), n. 1. Ability of a material to resume its original size and shape after a deformation. 2. Ability of an equipment, machine, or system to absorb the impact of the failure of one or more components or a significant disturbance in its environment, and to still continue to provide an acceptable level of service. http://www.businessdictionary.com/definition/resilience.html#ixzz4GRYjjW6y INTRODUCING A (NOT THAT) NEW CONCEPT YOUR DESTINY IS TO FAIL, SO FAIL WITH STYLE.

Slide 6

Slide 6 text

>>> import requests >>> r = requests.get(“http://es.s.edo.so/search?q=PythonExpose“) (SOME OF) STABILITY PATTERNS Let’s see what we can do with a real example (back in the days). This Elasticsearch service has 5 different servers behind a load balancer.

Slide 7

Slide 7 text

TIMEOUTS (AND RETRY) Place a timeout on every requests, wait for the resource to be ready, otherwise time out the connection. Retry (as required). >>> import requests >>> from requests.adapters import HTTPAdapter >>> s.mount('http://', HTTPAdapter(max_retries=2)) >>> s.get(‘http://es.s.edo.so/search?q=PythonExpose', timeout=1) (SOME OF) STABILITY PATTERNS A thousand new connections are handled and degraded gracefully.

Slide 8

Slide 8 text

(SOME OF) STABILITY PATTERNS FAIL FAST Yes. Fail fast and deal with that. Just let the server knowing you want to fail fast. >>> import requests >>> from requests.adapters import HTTPAdapter) >>> headers = {‘X-Client-Timeout‘: '250'} >>> s.mount('http://', HTTPAdapter(max_retries=2)) >>> s.get(‘http://es.s.edo.so/search?q=PythonExpose', \ timeout=0.25, headers=headers) Two thousands new connections are handled and degraded gracefully.

Slide 9

Slide 9 text

BULKHEADS (SOME OF) STABILITY PATTERNS Don’t let failure taking the entire ship down, divide the ship into different watertight bulkheads and let them fail separately. >>> import requests >>> from requests.adapters import HTTPAdapter) >>> headers = {‘X-Client-Timeout‘: '250'} >>> s.mount('http://', HTTPAdapter(max_retries=2)) >>> s.get(‘http://read.es.s.edo.so/search?q=PythonExpose', \ timeout=1, headers=headers) A strategy can be separate the read from the write, in order to isolate the system from massive READ operations. Three thousands new connections are handled and degraded gracefully.

Slide 10

Slide 10 text

HUSTON, WE HAVE A PROBLEM. (SOME OF) STABILITY PATTERNS I TOLD YOU BEFORE, FAIL WITH STYLE. JUST SET ES UNREACHABLE Users are coming too fast, they are using the search like crazy but Elasticsearch is not responding. Five thousands new connections are timed out.

Slide 11

Slide 11 text

CIRCUIT BREAKERS Break the circuit before the house burns down HEALTHY or “closed” UNHEALTHY or “open” RESETTING or “half-open” SET A STATE TO YOUR DEPENDENCY PERFORM the operation BLOCK immediately and return TRY TO PERFORM or fail NO MORE REQUESTS IF THE SERVER IS NOT RESPONDING. NO MORE LOAD ON THE BACKEND, RECOVERY IS USUALLY FASTER.

Slide 12

Slide 12 text

>>> from functools import wraps >>> from datetime import datetime, timedelta >>> STATE_CLOSED = 'closed' >>> STATE_OPEN = 'open' >>> STATE_HALF_OPEN = 'half_open' >>> >>> class CircuitBreaker(object): ... def __init__(self, expected_exception=Exception, \ ... failure_threshold=5, recover_timeout=30, name=None): ... self._expected_exception = expected_exception ... self._failure_count = 0 ... self._failure_threshold = failure_threshold ... self._recover_timeout = recover_timeout ... self._state = STATE_CLOSED ... self._opened = datetime.utcnow() ... self._name = name CIRCUIT BREAKERS: SIMPLE CLASS

Slide 13

Slide 13 text

CIRCUIT BREAKERS: SIMPLE CODE >>> import requests >>> import circuit_breaker >>> from requests.adapters import HTTPAdapter) >>> @CircuitBreaker(failure_threshold=2, recover_timeout=3) ... def ask(url): ... headers = {‘X-Client-Timeout‘: '250'} ... s.mount('http://', HTTPAdapter(max_retries=2)) ... s.get(url,timeout=1, headers=headers) >>> try: >>> ask(“http://read.es.s.edo.so/search?q=PythonExpose >>> catch CircuitOpenException e: >>> print “Circuit is open! Mayday!” >>> catch Exception e: >>> print “This is not my fault!” (Ooops, I never tested this in the command line)

Slide 14

Slide 14 text

CIRCUIT BREAKERS: BEHAVE! NEW BEHAVIOURS • You can know in advance about the health of the system. • You can build features on top of this. • You can trigger actions based on top of this. • You can create new logics on top of this. • You can escalate better on top of this.

Slide 15

Slide 15 text

CIRCUIT BREAKERS: REACT! LIKE TRY TO USE ANOTHER CLUSTER / SYSTEM INSTEAD OF THE UNAVAILABLE ONE. LIKE REDIRECT USERS TO ANOTHER HEALTHY SYSTEM AND LET THEM CONTINUE THE TOUR. LIKE TRIGGERING A DATA INTEGRITY CHECKUP AFTER THE SYSTEM HAS RECOVERED. LIKE STORE ALL THE INFORMATION IN A LOG FILE AS THE DATABASE IS OUT OF SERVICE. LIKE INFORM YOUR TECH GUY EVERY TIME A CIRCUIT IS OPEN. • You can know in advance about the healthiness of the system. • You can build features on top of this. • You can trigger actions based on top of this. • You can create new logics on top of this. • You can escalate better on top of this. NEW POSSIBILITIES

Slide 16

Slide 16 text

THANKS! DOING SOMETHING IS BETTER THAN NOTHING! @sked1