Servers are doomed to fail

Servers are doomed to fail Jaana B. Dogan [email protected] @rakyll

Serverless is also doomed to fail Jaana B. Dogan [email protected]
@rakyll

Systems are doomed to fail Jaana B. Dogan [email protected] @rakyll

Is failure OK? Is failure an unexpected case?

Failure is not an exception. Systems change all the time.

“I haven’t touched the code for a century, it should
just work.” Said no one ever.

Failure is expected. Yes, it is.

@rakyll monitoring debugging postmortem

Monitoring is about saying if something is broken.

“99.99% of the requests should return in 100ms.”

@rakyll

Debugging

Debugging is collaborative.

Debugging comes in flavors. Logs Traces Metrics ...

Postmortems

Blameless? Focus on identifying problems.

Collaboration Design for collaboration.

Design for failure Set SLOs, plan for instrumentation, plan for
debugging.

Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google

Correlation Jump from monitoring/debugging data to data.

On-call debugging Jump from distributed tracing data to on-call information.
who to page?

Dynamic collection Capability to enable more collection in production when
needed.

Continuous collection Continuously collect signals, generate ﬂeet-wide analysis reports.

Introspection Introspection pages provided from the services.

@rakyll monitoring debugging postmortem

Thank you Jaana B. Dogan Google [email protected]

Servers are doomed to fail

Servers are doomed to fail

JBD

More Decks by JBD

Other Decks in Technology

Featured

Transcript