Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
SOLID SNAKES HYNEK SCHLAWACK
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
ATTITUDE
Slide 5
Slide 5 text
INCENTIVES
Slide 6
Slide 6 text
IMPORTANT VS URGENT
Slide 7
Slide 7 text
THE PRICE OF RELIABILITY IS THE PURSUIT OF THE UTMOST SIMPLICITY. Sir C.A.R. Hoare SIMPLICITY
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
NORMAL ACCIDENTS
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
No content
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
ESSENTIAL
Slide 14
Slide 14 text
ESSENTIAL VS ACCIDENTAL
Slide 15
Slide 15 text
No content
Slide 16
Slide 16 text
No content
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
No content
Slide 19
Slide 19 text
OPERATIONAL COMPLEXITY
Slide 20
Slide 20 text
your DC Client App DB Redis Cache CDN Work Queue
Slide 21
Slide 21 text
your DC Client App DB Redis Cache CDN Work Queue
Slide 22
Slide 22 text
your DC Client App DB Redis Cache CDN Work Queue
Slide 23
Slide 23 text
MICROSERVICES
Slide 24
Slide 24 text
Service 2 Service 3 Service 1 Service 4 Service 5 Service 6 Service 7 Service 8
Slide 25
Slide 25 text
No content
Slide 26
Slide 26 text
COMPLEXITY IS REALITY
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
PLAN FOR STUPIDITY
Slide 29
Slide 29 text
I DON’T BELIEVE IN HUMAN ERROR John Allspaw, CTO at Etsy HUMAN ERRORS
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
ILLEGAL STATE
Slide 32
Slide 32 text
No content
Slide 33
Slide 33 text
1. VALID AFTER INITIALIZATION
Slide 34
Slide 34 text
1. VALID AFTER INITIALIZATION 2. PREVENT MUTATION TO ILLEGAL
Slide 35
Slide 35 text
NO PARTIAL INITIALIZATION conn = Connection() conn.tls = True conn.connect("host.name")
Slide 36
Slide 36 text
NO PARTIAL INITIALIZATION: CLASSMETHOD FACTORIES conn = Connection.connect( "host.name", tls=True )
Slide 37
Slide 37 text
NO PARTIAL INITIALIZATION: BUILDER PATTERN conn = ConnectionBuilder() \ .for_hostname("host.name") \ .with_tls(True) \ .connect()
Slide 38
Slide 38 text
PREVENT MUTATION TO ILLEGAL
Slide 39
Slide 39 text
PREVENT MUTATION
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
DATA VALIDATION
Slide 42
Slide 42 text
DATA VALIDATION AT EDGES
Slide 43
Slide 43 text
DATA VALIDATION NORMALIZATION AT EDGES
Slide 44
Slide 44 text
No content
Slide 45
Slide 45 text
PLOT TWIST!
Slide 46
Slide 46 text
FAILURE IS INEVITABLE
Slide 47
Slide 47 text
RELIABILITY
Slide 48
Slide 48 text
RELIABILITY Twitter 2007
Slide 49
Slide 49 text
RELIABILITY Twitter 2007 NASA 1969
Slide 50
Slide 50 text
No content
Slide 51
Slide 51 text
FAILURE IS INEVITABLE
Slide 52
Slide 52 text
FAILURE IS INEVITABLE (⌐■_■)
Slide 53
Slide 53 text
EXPECT
Slide 54
Slide 54 text
No content
Slide 55
Slide 55 text
No content
Slide 56
Slide 56 text
TIMEOUTS
Slide 57
Slide 57 text
No content
Slide 58
Slide 58 text
CLOSED Local Client Remote API Circuit Breaker call() call() result result
Slide 59
Slide 59 text
CLOSED → OPEN Local Client Remote API Circuit Breaker call() call() timeout! timeout!
Slide 60
Slide 60 text
OPEN Local Client Remote API Circuit Breaker call() circuit open!
Slide 61
Slide 61 text
OPEN → HALF-CLOSED Local Client Remote API Circuit Breaker call() call() result result
Slide 62
Slide 62 text
REDUNDANCY
Slide 63
Slide 63 text
No content
Slide 64
Slide 64 text
DOCS
Slide 65
Slide 65 text
DEAL WITH IT (¬∎_∎)
Slide 66
Slide 66 text
DON’T MAKE IT WORSE
Slide 67
Slide 67 text
RETRIES
Slide 68
Slide 68 text
BACKOFF
Slide 69
Slide 69 text
BACKOFF EXPONENTIAL
Slide 70
Slide 70 text
BACKOFF EXPONENTIAL WITH JITTER
Slide 71
Slide 71 text
Frontend Backend 3x
Slide 72
Slide 72 text
Internal Backend A Internal Backend B 9x 9x Frontend Backend 3x
Slide 73
Slide 73 text
Internal Backend C 27x Internal Backend A Internal Backend B 9x 9x Frontend Backend 3x
Slide 74
Slide 74 text
DON’T SWALLOW ERRORS
Slide 75
Slide 75 text
try: do_something() return True except Exception: return False
Slide 76
Slide 76 text
try: do_something() except Exception: raise AppException()
Slide 77
Slide 77 text
try: do_something() return True except Exception as e: raise AppException() from e
Slide 78
Slide 78 text
try: do_something() return True except Exception as e: raise AppException() from e AppException().__cause__ == e
Slide 79
Slide 79 text
DON’T TRY TOO HARD
Slide 80
Slide 80 text
sys.exit(1)
Slide 81
Slide 81 text
CRASH-ONLY
Slide 82
Slide 82 text
FAIL FAST FAIL LOUDLY
Slide 83
Slide 83 text
FOCUS ON RECOVERY
Slide 84
Slide 84 text
MTTR
Slide 85
Slide 85 text
No content
Slide 86
Slide 86 text
ZERO EXPECTATIONS
Slide 87
Slide 87 text
No content
Slide 88
Slide 88 text
FAULT TOLERANCE
Slide 89
Slide 89 text
FAULT TOLERANCE RECOVERY
Slide 90
Slide 90 text
OX.CX/SS @HYNEK VRMD.DE