Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.5k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Serverless Containers
rakyll
1
250
Critical Path Analysis
rakyll
0
610
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.2k
Other Decks in Technology
See All in Technology
Google Agentspaceを実際に導入した効果と今後の展望
mixi_engineers
PRO
3
350
AIに目を奪われすぎて、周りの困っている人間が見えなくなっていませんか?
cap120
1
440
【CEDEC2025】『ウマ娘 プリティーダービー』における映像制作のさらなる高品質化へ!~ 豊富な素材出力と制作フローの改善を実現するツールについて~
cygames
PRO
0
240
AIのグローバルトレンド 2025 / ai global trend 2025
kyonmm
PRO
1
120
UDDのススメ - 拡張版 -
maguroalternative
1
310
【OptimizationNight】数理最適化のラストワンマイルとしてのUIUX
brainpadpr
1
380
【CEDEC2025】ブランド力アップのためのコンテンツマーケティング~ゲーム会社における情報資産の活かし方~
cygames
PRO
0
240
【CEDEC2025】現場を理解して実現!ゲーム開発を効率化するWebサービスの開発と、利用促進のための継続的な改善
cygames
PRO
0
730
Mambaで物体検出 完全に理解した
shirarei24
2
220
SRE新規立ち上げ! Hubbleインフラのこれまでと展望
katsuya0515
0
170
形式手法特論:位相空間としての並行プログラミング #kernelvm / Kernel VM Study Tokyo 18th
ytaka23
3
930
Foundation Model × VisionKit で実現するローカル OCR
sansantech
PRO
1
320
Featured
See All Featured
Speed Design
sergeychernyshev
32
1.1k
Raft: Consensus for Rubyists
vanstee
140
7.1k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
21
1.4k
Statistics for Hackers
jakevdp
799
220k
KATA
mclloyd
32
14k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.5k
Six Lessons from altMBA
skipperchong
28
3.9k
It's Worth the Effort
3n
185
28k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.8k
Documentation Writing (for coders)
carmenintech
73
5k
BBQ
matthewcrist
89
9.8k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]