Five NEINs of Availability

Five NEINs of Availability Tomer Gabel Cloud Nein, June 2020

Same Old Song and Dance Image by Sarah Giboni on
Flickr (CC BY 2.0)

Who am I?

MAKE IT IMPOSSIBLE TO BREAK 1. Don’t… Image by Bert
Heymans on Flickr (CC BY-NC-ND 2.0)

• Human error is inevitable • More process won’t solve
the problem • More process will screw you over • Avoid the process entirely, or provide means of circumventing it

PUT ARTIFICIAL BARRIERS IN PLACE 2. Don’t… Image by Jessica
BKK on Flickr (CC BY 2.0)

• Regulation – Apply only where required, and/or – Allow
access with proper logging/auditing. It’ll end up cheaper • Lack of trust – You trust them to build it, but not to operate it? – Won’t help you anyway Barriers

THINK OF FAILURES AS EXCEPTIONAL 3. Don’t… Image by Piratenmensch
on Flickr (CC BY-SA 2.0)

Source: AWS case study by Slack

MAKE PROBLEMS GO AWAY 4. Don’t… Top image by chaouki
on Flickr (CC BY-SA 2.0), bottom image by Caffeinatrix on Flickr (CC BY-NC-ND 2.0)

Source: Imgflip

Panic Reboot Nasty side effects Obvious: • Data loss •
Data corruption • Interrupted users Subtle: • Partial transactions • Data inconsistency • Abnormal load (e.g. cache warmup)

Panic Reboot Nasty side effects Lost opportunity It’s a unique
state • Unexpected • Pathological • Visible Collect data! • Thread dumps • Heap dumps • Metrics, logs Act on it!

Panic Reboot Nasty side effects Lost opportunity Bryan Cantrill Debugging
Microservices in Production, QCon SF 2015

HARASS YOUR DEBUGGERS 5. Don’t… Image by mariana neri on
Flickr (CC BY-NC-ND 2.0)

There’s an issue! Fix it! WHY ISN’T IT FIXED YET?
Is it fixed yet? Can you send a status update? Don’t forget to save the screenshots!

In conclusion Don't… 1. Make it impossible to break 2.
Put artificial barriers in place 3. Think of failures as exceptional 4. Make problems go away 5. Harass your debuggers

In conclusion Don't… 1. Make it impossible to break 2.
Put artificial barriers in place 3. Think of failures as exceptional 4. Make problems go away 5. Harass your debuggers Do… • Trust your engineers • Assume and plan for failure • Gather evidence before acting • Invest in incident management

QUESTIONS? Thank you for listening [email protected] @tomerg On GitHub: https://github.com/holograph
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Five NEINs of Availability

Five NEINs of Availability

Tomer Gabel

More Decks by Tomer Gabel

Other Decks in Technology

Featured

Transcript