[2018.03 Meetup] [TALK] Luis Mineiro - Reliability Patterns for Fun and Profit

FOR FUN AND PROFIT R E L I A B
I L I T Y PATTERNS D E V O P S L I S B O N 1 2 . 0 3 . 2 0 18   L I S B O N , P O RT U G A L L U I S M I N E I R O @voidmaze

WHAT'S THIS?

WHAT ABOUT THIS?

TYPICAL TRAFFIC BURST

What happens when the following operation fails? Cart cart =
restTemplate.getForObject(url, Cart.class); H A N D L I N G FA U LT S

for (int i = 1; i <= numRetries; i++) {
try { return restTemplate.getForObject(url, Cart.class); } catch (RestClientException e) { LOG.error("failed to get cart", e); if (i >= numRetries) { throw e; } } } R E T RY I N G

We should only retry if the problem is due to
a network failure or server overload TRANSIENT FAULTS

for (int i = 1; i <= numRetries; i++) {
try { return restTemplate.getForObject(url, Cart.class); } catch (RestClientException e) { LOG.error("failed to get cart", e); if (i >= numRetries || isNonTransientFault(e)) { throw e; } } } B E T T E R R E T RY

E V E N B E T TE R RET
RY

long computeWaitTime(int retryNumber, int maxWaitTime) { int delay = WAIT_TIME_MULTIPLIER
* 2^retryNumber; return min(maxWaitTime, delay + random.nextInt(delay)); } ... for (int i = 1; i <= numRetries; i++) { try { return restTemplate.getForObject(url, Cart.class); } catch (RestClientException e) { LOG.error("failed to get cart", e); if (i >= numRetries || isNonTransientFault(e)) { throw e; } sleep(computeWaitTime(i, MAX_WAIT_TIME)); } } TRUNCATED EXPONENTIAL BACKOFF WITH JITTER

WHEN TRANSIENT FAULTS BECOME PERMANENT

C I R C U I T B R E
A K E R PAT T E R N The circuit breaker pattern can prevent an application from repeatedly trying to execute an operation that's likely to fail

C I R C U I T B R E
A K E R C L O S E D Closed State  The requests from the application are forwarded to the target TA R GET

C I R C U I T B R E
A K E R O P E N Open State  The request from the application fails immediately and an exception is returned to the application. TA R GET

C I R C U I T B R E
A K E R H A L F - O P E N Half-Open State  A limited number of requests from the application are allowed to pass through and invoke the operation. TA R GET

C I R C U I T B R E
A K E R O P E N Open State  The request from the application fails immediately and an exception is returned to the application. TA R GET

private double doSomeMath(int result) { if(result != 0) { return
42 / result; } return Double.NaN; } THE MOST IMPORTANT QUESTION

P R O DU C T D E TAI L
PAG E

circuitBreaker.call((url) -> { for (int i = 1; i <=
numRetries; i++) { try { return restTemplate.getForObject(url, Product.class); } catch (RestClientException e) { LOG.error("failed to get product details", e); if (i >= numRetries || isNonTransientFault(e)) { throw e; } sleep(computeWaitTime(i, MAX_WAIT_TIME)); } } throw new NoMoreRetriesException(); }).fallback(() -> "a Partner"); P U TT I N G I T A L L TOG ET HER

H A N D S - O N E X
E R C I S E R E T RY C I R C U I T  BREAKER FALLBACK P RODUCT D E TA I L PA G E TA R GET

E X E RC I S E - P R
OD UC T D ETA IL PAGE

OD UC T D ETA IL PAGE B R A N D D ATA PRO DUCT DATA WISH LIST SI ZE RE COMME NDATION C A R T D E L I V E RY O P T I O N S S IZ E SE LE CTO R

OD UC T D ETA IL PAGE GROUP 2: BR AN D D ATA GROUP 3: PR OD U C T D ATA GROUP 4: WISH LIST GROUP 1: SI ZE R E C OMME NDAT IO N GROUP 5: C A RT GROUP 7: D E L I V E RY O P T I O N S GROUP 6: SI ZE S E LE C TOR GROUPS TASKS 1.Retries Retryable operation? How many times? 2.Circuit breaker Global circuit breaker? 3.Fail fast Type of fallback Delegate to frontend? 2 0 M I N

THANK YOU QUESTIONS? WE'RE HIRING https://jobs.zalando.com

@ZalandoTech https://tech.zalando.com LUIS MINEIRO @voidmaze [email protected] 1 2 . 0
3 . 2 0 1 8

[2018.03 Meetup] [TALK] Luis Mineiro - Reliabil...

[2018.03 Meetup] [TALK] Luis Mineiro - Reliability Patterns for Fun and Profit

DevOps Lisbon

More Decks by DevOps Lisbon

Other Decks in Technology

Featured

Transcript

FOR FUN AND PROFIT R E L I A B

WHAT'S THIS?

WHAT ABOUT THIS?

TYPICAL TRAFFIC BURST

What happens when the following operation fails? Cart cart =

for (int i = 1; i <= numRetries; i++) {

We should only retry if the problem is due to

for (int i = 1; i <= numRetries; i++) {

E V E N B E T TE R RET

long computeWaitTime(int retryNumber, int maxWaitTime) { int delay = WAIT_TIME_MULTIPLIER

WHEN TRANSIENT FAULTS BECOME PERMANENT

C I R C U I T B R E

C I R C U I T B R E

C I R C U I T B R E

C I R C U I T B R E

C I R C U I T B R E

C I R C U I T B R E

C I R C U I T B R E

C I R C U I T B R E

C I R C U I T B R E

C I R C U I T B R E

private double doSomeMath(int result) { if(result != 0) { return

P R O DU C T D E TAI L

P R O DU C T D E TAI L

P R O DU C T D E TAI L

circuitBreaker.call((url) -> { for (int i = 1; i <=

H A N D S - O N E X

E X E RC I S E - P R

E X E RC I S E - P R

E X E RC I S E - P R

THANK YOU QUESTIONS? WE'RE HIRING https://jobs.zalando.com

@ZalandoTech https://tech.zalando.com LUIS MINEIRO @voidmaze [email protected] 1 2 . 0