Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
Servers are doomed to fail
JBD
May 17, 2019
Technology
2
1.1k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
rakyll
5
1.2k
rakyll
7
1.5k
rakyll
0
120
rakyll
0
110
rakyll
2
870
rakyll
0
3.6k
rakyll
1
190
rakyll
0
100
rakyll
2
1.5k
Other Decks in Technology
See All in Technology
yokatsuki
1
190
yukitodate
2
320
shirayanagiryuji
1
400
tsuyo
0
180
masakick
0
120
soracom
0
260
iqbocchi
0
520
smzksts
0
210
minamizaki
0
560
am7cinnamon
2
2.7k
nihonbuson
2
1.6k
vkbaba
0
110
Featured
See All Featured
addyosmani
310
21k
addyosmani
494
110k
jponch
103
4.9k
frogandcode
127
20k
wjessup
338
16k
brad_frost
156
6.4k
dougneiner
55
5.4k
sugarenia
233
830k
lauravandoore
437
28k
geoffreycrofte
18
770
rmw
11
740
geeforr
332
29k
Transcript
Servers are doomed to fail Jaana B. Dogan jbd@google.com @rakyll
Serverless is also doomed to fail Jaana B. Dogan jbd@google.com
@rakyll
Systems are doomed to fail Jaana B. Dogan jbd@google.com @rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google jbd@google.com