Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Servers are doomed to fail Jaana B. Dogan jbd@google.com @rakyll
Slide 2
Slide 2 text
Serverless is also doomed to fail Jaana B. Dogan jbd@google.com @rakyll
Slide 3
Slide 3 text
Systems are doomed to fail Jaana B. Dogan jbd@google.com @rakyll
Slide 4
Slide 4 text
Is failure OK? Is failure an unexpected case?
Slide 5
Slide 5 text
Failure is not an exception. Systems change all the time.
Slide 6
Slide 6 text
“I haven’t touched the code for a century, it should just work.” Said no one ever.
Slide 7
Slide 7 text
Failure is expected. Yes, it is.
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
@rakyll monitoring debugging postmortem
Slide 10
Slide 10 text
Monitoring is about saying if something is broken.
Slide 11
Slide 11 text
“99.99% of the requests should return in 100ms.”
Slide 12
Slide 12 text
@rakyll
Slide 13
Slide 13 text
@rakyll
Slide 14
Slide 14 text
Debugging
Slide 15
Slide 15 text
Debugging is collaborative.
Slide 16
Slide 16 text
Debugging comes in flavors. Logs Traces Metrics ...
Slide 17
Slide 17 text
Postmortems
Slide 18
Slide 18 text
Postmortems
Slide 19
Slide 19 text
Postmortems
Slide 20
Slide 20 text
Blameless? Focus on identifying problems.
Slide 21
Slide 21 text
Collaboration Design for collaboration.
Slide 22
Slide 22 text
Design for failure Set SLOs, plan for instrumentation, plan for debugging.
Slide 23
Slide 23 text
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Slide 24
Slide 24 text
Correlation Jump from monitoring/debugging data to data.
Slide 25
Slide 25 text
On-call debugging Jump from distributed tracing data to on-call information. who to page?
Slide 26
Slide 26 text
Dynamic collection Capability to enable more collection in production when needed.
Slide 27
Slide 27 text
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Slide 28
Slide 28 text
Introspection Introspection pages provided from the services.
Slide 29
Slide 29 text
@rakyll monitoring debugging postmortem
Slide 30
Slide 30 text
Thank you Jaana B. Dogan Google jbd@google.com