Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.5k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Serverless Containers
rakyll
1
250
Critical Path Analysis
rakyll
0
610
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.2k
Other Decks in Technology
See All in Technology
2025 AWS Jr. Championが振り返るAWS Summit
kazukiadachi
0
110
B2C&B2B&社内向けサービスを抱える開発組織におけるサービス価値を最大化するイニシアチブ管理
belongadmin
1
6.9k
ビズリーチにおけるリアーキテクティング実践事例 / JJUG CCC 2025 Spring
visional_engineering_and_design
1
120
品質と速度の両立:生成AI時代の品質保証アプローチ
odasho
1
340
DBのスキルで生き残る技術 - AI時代におけるテーブル設計の勘所
soudai
PRO
47
19k
Should Our Project Join the CNCF? (Japanese Recap)
whywaita
PRO
0
340
OPENLOGI Company Profile
hr01
0
67k
SEQUENCE object comparison - db tech showcase 2025 LT2
nori_shinoda
0
130
Glacierだからってコストあきらめてない? / JAWS Meet Glacier Cost
taishin
1
160
Getting to Know Your Legacy (System) with AI-Driven Software Archeology (WeAreDevelopers World Congress 2025)
feststelltaste
1
130
Enhancing SaaS Product Reliability and Release Velocity through Optimized Testing Approach
ropqa
1
230
IPA&AWSダブル全冠が明かす、人生を変えた勉強法のすべて
iwamot
PRO
2
120
Featured
See All Featured
Making the Leap to Tech Lead
cromwellryan
134
9.4k
Six Lessons from altMBA
skipperchong
28
3.9k
Done Done
chrislema
184
16k
Music & Morning Musume
bryan
46
6.6k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
126
53k
Embracing the Ebb and Flow
colly
86
4.7k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
667
120k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
32
2.4k
The Art of Programming - Codeland 2020
erikaheidi
54
13k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
331
22k
How GitHub (no longer) Works
holman
314
140k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]