The Case for Chaos: Thinking About Failure Holistically

T H I N K I N G A B
O U T FA I L U R E H O L I S T I C A L LY T H E C A S E F O R C H A O S :

~ W H O A M I Patrick Higgins  
@higgyCodes UI Engineer @ Gremlin

~ W H O A M I • From Sydney,
Australia • Former Salt Lake Citizen • Lives in San Francisco

O U T L I N E • Chaos Engineering
• GameDays • Holistic Failure Mitigation

C H A O S E N G I N
E E R I N G • “Thoughtful, planned experiments designed to reveal the weaknesses in our system” - Kolton Andrus • Like a vaccine, we inject harm into our system to help build immunity.

C H A O S I S A P R
A C T I C E

W H Y C H A O S E N
G I N E E R ? • The motivations are different depending on role: • Business case - avoiding costly downtime • On call case - avoiding 3am pages • Engineering - service availability

– M AT H I A S L A F
E L D T “The lesson we should learn and remember is that sooner or later, all complex systems will fail.”

D O W N T I M E I S
C O S T LY • Prevents sales • Affects customer trust • Contributes to engineer burnout

P R E R E Q U I S I
T E S F O R C H A O S • Have a High Severity Incident Management (SEV) Program • Have sufficient monitoring to observe effects • Alerts and paging, that notify a human during a SEV

C H A O S E N G I N
E E R I N G L I F E C Y C L E

W O R D O F WA R N I
N G • Never run a chaos experiment (in production) if you know it will cause severe damage.

C H A O S M I T I G
AT I O N I S M U LT I FA C E T E D • People get better at mitigating failure. • Product is engineered with failure in mind.

G A M E D AY S

– H O M I N G L I Dedicated
time for teams to collaboratively focus on using Chaos Engineering practices to reveal weaknesses in your services W H AT I S A G A M E D AY ?

W H O S H O U L D PA
R T I C I PAT E ?

W H Y E V E RY B O D
Y ? • Everybody benefits from observing failure • Encourages cross-organization collaboration • Find your champions across the company • Encourages varied perspectives

T H I N G S T O R E
M E M B E R

M Y F I R S T G A M
E D AY • Gremlin holds Failure Fridays • Degradation of my features in the UI was less than desirable • Mapped out the critical failures, dropped tickets into tech debt, dealt with the tickets gradually as time allowed.

H O L I S T I C FA I
L U R E M I T I G AT I O N

C H A O S E N G I N
E E R I N G A N D U I • Graceful Degradation in UI implementation • Critical User Paths • Auxiliary Paths • Sometimes the two are mixed

C H A O S E N G I N
E E R I N G A N D U I • End-to-End testing of failure scenarios is not enough. • OSS Developer tooling around failure mitigation in UI is underdeveloped. • Tooling is regularly company specific.

C H A O S E N G I N
E E R I N G A N D P R O D U C T • Mapping out potential alternative states (reroute, retry) • Product specs that include comprehensive failure scenarios are rare

R E S O U R C E S AWESOME
CHAOS ENGINEERING dastergon/awesome-chaos-engineering

R E S O U R C E S GAME
DAY RESOURCES gremlin.com/gameday

G E T I N V O LV E D
CHAOS COMMUNITY SLACK gremlin.com/slack

SLC CHAOS ENGINEERING MEETUP meetup.com/Salt-Lake-City-Chaos-Engineering- Community/ G E T I
N V O LV E D

G E T I N V O LV E D
CHAOS CONF (SF) September 28th, 2018 chaosconf.io

T H A N K S ! Patrick Higgins  
@higgyCodes UI Engineer @ Gremlin

The Case for Chaos: Thinking About Failure Holi...

The Case for Chaos: Thinking About Failure Holistically

Pat Higgins

More Decks by Pat Higgins

Other Decks in Technology

Featured

Transcript

T H I N K I N G A B

~ W H O A M I Patrick Higgins

~ W H O A M I • From Sydney,

O U T L I N E • Chaos Engineering

C H A O S E N G I N

C H A O S I S A P R

W H Y C H A O S E N

– M AT H I A S L A F

D O W N T I M E I S

P R E R E Q U I S I

C H A O S E N G I N

W O R D O F WA R N I

C H A O S M I T I G

G A M E D AY S

– H O M I N G L I Dedicated

W H O S H O U L D PA

W H Y E V E RY B O D

T H I N G S T O R E

M Y F I R S T G A M

H O L I S T I C FA I

C H A O S E N G I N

C H A O S E N G I N

C H A O S E N G I N

R E S O U R C E S AWESOME

R E S O U R C E S GAME

G E T I N V O LV E D

SLC CHAOS ENGINEERING MEETUP meetup.com/Salt-Lake-City-Chaos-Engineering- Community/ G E T I

G E T I N V O LV E D

T H A N K S ! Patrick Higgins