Full Stack Fest: Architectural Patterns of Resilient Distributed Systems

Slide 1

Slide 1 text

Architectural Patterns of Resilient Distributed Systems Full Stack Fest 2016

Slide 2

Slide 2 text

Ines Sombra @Randommood

Slide 3

Slide 3 text

Globally distributed & highly available

Slide 4

Slide 4 text

Today’s Journey Forest Company 1 2 3 4 Motivation Resilience in literature Resilience in industry Conclusions

Slide 5

Slide 5 text

Resilience is the ability of a system to adapt or keep working when challenges occur

Slide 6

Slide 6 text

Defining Resilience Fault-tolerance Evolvability Scalability Failure isolation Complexity management

Slide 7

Slide 7 text

It’s what really matters

Slide 8

Slide 8 text

How can we construct more resilient systems?

Slide 9

Slide 9 text

Resilience in Literature

Slide 10

Slide 10 text

Harvest & Yield Model

Slide 11

Slide 11 text

Fraction of successfully answered queries Close to uptime but more useful because it directly maps to user experience (uptime misses this) Focus on yield rather than uptime Yield

Slide 12

Slide 12 text

From Coda Hale’s “You can’t sacrifice partition tolerance” Server A Server B Server C Baby Animals Cute Fraction of the complete result Harvest

Slide 13

Slide 13 text

From Coda Hale’s “You can’t sacrifice partition tolerance” Server A Server B Server C Baby Animals Cute X 66% harvest Fraction of the complete result Harvest

Slide 14

Slide 14 text

Graceful harvest degradation under faults Randomness to make the worst-case & average-case the same Replication of high-priority data for greater harvest control Degrading results based on client capability #1: Probabilistic Availability

Slide 15

Slide 15 text

Decomposing into subsystems independently intolerant to harvest degradation but your application can continue if they fail Only provide strong consistency for the subsystems that need it Orthogonal mechanisms (state vs functionality) #2 Decomposition & Orthogonality 1 2 3 4 5

Slide 16

Slide 16 text

If your system favors yield or harvest is an outcome of its design “ ” ~ Fox & Brewer

Slide 17

Slide 17 text

Cook & Rasmussen model

Slide 18

Slide 18 text

Economic failure boundary Unacceptable workload boundary Accident boundary Operating point Cook & Rasmussen

Slide 19

Slide 19 text

Economic failure boundary Unacceptable workload boundary Accident boundary Cook & Rasmussen

Slide 20

Slide 20 text

Economic failure boundary Unacceptable workload boundary Accident boundary Pressure towards efficiency Cook & Rasmussen

Slide 21

Slide 21 text

Economic failure boundary Unacceptable workload boundary Accident boundary Pressure towards efficiency Reduction of effort Cook & Rasmussen

Slide 22

Slide 22 text

Economic failure boundary Unacceptable workload boundary Accident boundary Pressure towards efficiency Reduction of effort Incident! Cook & Rasmussen

Slide 23

Slide 23 text

Economic failure boundary Unacceptable workload boundary Accident boundary Pressure towards efficiency Reduction of effort Safety Campaign Cook & Rasmussen

Slide 24

Slide 24 text

Economic failure boundary Unacceptable workload boundary Accident boundary Pressure towards efficiency Reduction of effort error margin Marginal boundary Safety Campaign Cook & Rasmussen