Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
1.6k
3
Share
Servers are doomed to fail
JBD
May 17, 2019
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.8k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.7k
Are you ready for production?
rakyll
8
2.9k
Serverless Containers
rakyll
1
290
Critical Path Analysis
rakyll
0
700
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.3k
Other Decks in Technology
See All in Technology
サプライチェーン攻撃への備えについて考えている #湘なんか
stefafafan
2
2.2k
ECSのTerraformモジュールにコントリビュートした話
harukasakihara
0
280
ルール・ロール・ツールを創る / Creating Rules, Roles and Tools
ks91
PRO
0
150
Terragrunt x Snowflake + dbt で作るマルチテナントなデータ基盤構築プラットフォーム
gak_t12
0
520
業務に残された「良くない型」で考える「TypeScriptの難しさ」
sajikix
3
1.1k
DI コンテナ自動生成ツールを実装してみた / intro-autodi
uhzz
0
770
Cortex(Code) を ML モデルの 精度改善サイクルに組み込む.pdf
oimo23
0
250
JaSSTに関わることで変わった人生観 #jasstnano
makky_tyuyan
0
170
Pythonでベイズモデリング
soogie
0
170
Agentic AI時代における メルカリのAIガバナンスとガードレール実装
naoichihara
0
300
コーポレートサイトのアクセシビリティ改善とJIS準拠への実践
lycorptech_jp
PRO
2
110
20260516_SecJAWS_Days
takuyay0ne
2
550
Featured
See All Featured
Mind Mapping
helmedeiros
PRO
1
190
GitHub's CSS Performance
jonrohan
1033
470k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
570
Winning Ecommerce Organic Search in an AI Era - #searchnstuff2025
aleyda
1
2k
Making Projects Easy
brettharned
120
6.6k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.3k
Balancing Empowerment & Direction
lara
6
1.1k
KATA
mclloyd
PRO
35
15k
Everyday Curiosity
cassininazir
0
210
SEO for Brand Visibility & Recognition
aleyda
0
4.5k
Neural Spatial Audio Processing for Sound Field Analysis and Control
skoyamalab
0
300
Leo the Paperboy
mayatellez
7
1.8k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]