Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.5k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
620
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.2k
Other Decks in Technology
See All in Technology
Oracle Cloud Infrastructure IaaS 新機能アップデート 2025/06 - 2025/08
oracle4engineer
PRO
0
100
「その開発、認知負荷高すぎませんか?」Platform Engineeringで始める開発者体験カイゼン術
sansantech
PRO
2
190
AIエージェント開発用SDKとローカルLLMをLINE Botと組み合わせてみた / LINEを使ったLT大会 #14
you
PRO
0
130
5年目から始める Vue3 サイト改善 #frontendo
tacck
PRO
3
230
Claude Code でアプリ開発をオートパイロットにするためのTips集 Zennの場合 / Claude Code Tips in Zenn
wadayusuke
3
230
AWSを利用する上で知っておきたい名前解決のはなし(10分版)
nagisa53
10
3.2k
未経験者・初心者に贈る!40分でわかるAndroidアプリ開発の今と大事なポイント
operando
5
730
Snowflake Intelligenceにはこうやって立ち向かう!クラシルが考えるAI Readyなデータ基盤と活用のためのDataOps
gappy50
0
270
スクラムガイドに載っていないスクラムのはじめかた - チームでスクラムをはじめるときに知っておきたい勘所を集めてみました! - / How to start Scrum that is not written in the Scrum Guide 2nd
takaking22
1
120
現場で効くClaude Code ─ 最新動向と企業導入
takaakikakei
1
260
2025/09/16 仕様駆動開発とAI-DLCが導くAI駆動開発の新フェーズ
masahiro_okamura
0
100
今日から始めるAWSセキュリティ対策 3ステップでわかる実践ガイド
yoshidatakeshi1994
0
110
Featured
See All Featured
Side Projects
sachag
455
43k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
GraphQLの誤解/rethinking-graphql
sonatard
72
11k
Java REST API Framework Comparison - PWX 2021
mraible
33
8.8k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.6k
Embracing the Ebb and Flow
colly
87
4.8k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
580
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.7k
Done Done
chrislema
185
16k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
530
The Straight Up "How To Draw Better" Workshop
denniskardys
236
140k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]