Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[2018.03 Meetup] [TALK] Luis Mineiro - Reliabil...

[2018.03 Meetup] [TALK] Luis Mineiro - Reliability Patterns for Fun and Profit

Building large scale systems brings many challenges. With microservices we also see the complexity increase dramatically. Microservices communicate with each other over the network and, as the "Fallacies of Distributed Computing" explain, the network is unreliable. We need to prepare our systems to address that.

In this talk I'll focus on 3 specific reliability patterns. I’ll start by showing the most common way of calling remote APIs and demonstrate its fragility. I'll iteratively apply the reliability patterns to the exact same call, demonstrating how it can increase the system’s resilience and reliability.

The talk is followed by a workshop. In this workshop we'll work together to improve the reliability of a given example. In the process participants will find creative ways to apply the reliability patterns and how others what they came up with.

Luis's broad background in software engineering includes experience in DevOps, networks, mobile development, and more. Luis has been with Zalando since 2013--shaving bike sheds and creating the most beautiful yaks in the Shop team, later joining Platform Infrastructure to support the company’s move to the Cloud and currently going through the Site Reliability Engineering journey. Originally from Portugal, Luis has a masters in computer science from the Instituto Politécnico de Viseu.

Twitter: https://twitter.com/voidmaze

DevOps Lisbon

March 12, 2018
Tweet

More Decks by DevOps Lisbon

Other Decks in Technology

Transcript

  1. FOR FUN AND PROFIT R E L I A B

    I L I T Y PATTERNS D E V O P S L I S B O N 1 2 . 0 3 . 2 0 18 
 L I S B O N , P O RT U G A L L U I S M I N E I R O @voidmaze
  2. What happens when the following operation fails? Cart cart =

    restTemplate.getForObject(url, Cart.class); H A N D L I N G FA U LT S
  3. for (int i = 1; i <= numRetries; i++) {

    try { return restTemplate.getForObject(url, Cart.class); } catch (RestClientException e) { LOG.error("failed to get cart", e); if (i >= numRetries) { throw e; } } } R E T RY I N G
  4. We should only retry if the problem is due to

    a network failure or server overload TRANSIENT FAULTS
  5. for (int i = 1; i <= numRetries; i++) {

    try { return restTemplate.getForObject(url, Cart.class); } catch (RestClientException e) { LOG.error("failed to get cart", e); if (i >= numRetries || isNonTransientFault(e)) { throw e; } } } B E T T E R R E T RY
  6. long computeWaitTime(int retryNumber, int maxWaitTime) { int delay = WAIT_TIME_MULTIPLIER

    * 2^retryNumber; return min(maxWaitTime, delay + random.nextInt(delay)); } ... for (int i = 1; i <= numRetries; i++) { try { return restTemplate.getForObject(url, Cart.class); } catch (RestClientException e) { LOG.error("failed to get cart", e); if (i >= numRetries || isNonTransientFault(e)) { throw e; } sleep(computeWaitTime(i, MAX_WAIT_TIME)); } } TRUNCATED EXPONENTIAL BACKOFF WITH JITTER
  7. C I R C U I T B R E

    A K E R PAT T E R N The circuit breaker pattern can prevent an application from repeatedly trying to execute an operation that's likely to fail
  8. C I R C U I T B R E

    A K E R C L O S E D Closed State
 The requests from the application are forwarded to the target TA R GET
  9. C I R C U I T B R E

    A K E R C L O S E D Closed State
 The requests from the application are forwarded to the target TA R GET
  10. C I R C U I T B R E

    A K E R C L O S E D Closed State
 The requests from the application are forwarded to the target TA R GET
  11. C I R C U I T B R E

    A K E R C L O S E D Closed State
 The requests from the application are forwarded to the target TA R GET
  12. C I R C U I T B R E

    A K E R O P E N Open State
 The request from the application fails immediately and an exception is returned to the application. TA R GET
  13. C I R C U I T B R E

    A K E R H A L F - O P E N Half-Open State
 A limited number of requests from the application are allowed to pass through and invoke the operation. TA R GET
  14. C I R C U I T B R E

    A K E R H A L F - O P E N Half-Open State
 A limited number of requests from the application are allowed to pass through and invoke the operation. TA R GET
  15. C I R C U I T B R E

    A K E R H A L F - O P E N Half-Open State
 A limited number of requests from the application are allowed to pass through and invoke the operation. TA R GET
  16. C I R C U I T B R E

    A K E R O P E N Open State
 The request from the application fails immediately and an exception is returned to the application. TA R GET
  17. private double doSomeMath(int result) { if(result != 0) { return

    42 / result; } return Double.NaN; } THE MOST IMPORTANT QUESTION
  18. circuitBreaker.call((url) -> { for (int i = 1; i <=

    numRetries; i++) { try { return restTemplate.getForObject(url, Product.class); } catch (RestClientException e) { LOG.error("failed to get product details", e); if (i >= numRetries || isNonTransientFault(e)) { throw e; } sleep(computeWaitTime(i, MAX_WAIT_TIME)); } } throw new NoMoreRetriesException(); }).fallback(() -> "a Partner"); P U TT I N G I T A L L TOG ET HER
  19. H A N D S - O N E X

    E R C I S E R E T RY C I R C U I T
 BREAKER FALLBACK P RODUCT D E TA I L PA G E TA R GET
  20. E X E RC I S E - P R

    OD UC T D ETA IL PAGE
  21. E X E RC I S E - P R

    OD UC T D ETA IL PAGE B R A N D D ATA PRO DUCT DATA WISH LIST SI ZE RE COMME NDATION C A R T D E L I V E RY O P T I O N S S IZ E SE LE CTO R
  22. E X E RC I S E - P R

    OD UC T D ETA IL PAGE GROUP 2: BR AN D D ATA GROUP 3: PR OD U C T D ATA GROUP 4: WISH LIST GROUP 1: SI ZE R E C OMME NDAT IO N GROUP 5: C A RT GROUP 7: D E L I V E RY O P T I O N S GROUP 6: SI ZE S E LE C TOR GROUPS TASKS 1.Retries Retryable operation? How many times? 2.Circuit breaker Global circuit breaker? 3.Fail fast Type of fallback Delegate to frontend? 2 0 M I N