Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.5k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
630
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.2k
Other Decks in Technology
See All in Technology
自動テストのコストと向き合ってみた
qa
0
110
SREとソフトウェア開発者の合同チームはどのようにS3のコストを削減したか?
muziyoshiz
1
100
[2025-09-30] Databricks Genie を利用した分析基盤とデータモデリングの IVRy の現在地
wxyzzz
0
470
DataOpsNight#8_Terragruntを用いたスケーラブルなSnowflakeインフラ管理
roki18d
1
340
Function calling機能をPLaMo2に実装するには / PFN LLMセミナー
pfn
PRO
0
920
Modern_Data_Stack最新動向クイズ_買収_AI_激動の2025年_.pdf
sagara
0
200
Flaky Testへの現実解をGoのプロポーザルから考える | Go Conference 2025
upamune
1
420
データエンジニアがこの先生きのこるには...?
10xinc
0
440
Oracle Cloud Infrastructure:2025年9月度サービス・アップデート
oracle4engineer
PRO
0
390
stupid jj tricks
indirect
0
7.9k
Large Vision Language Modelを用いた 文書画像データ化作業自動化の検証、運用 / shibuya_AI
sansan_randd
0
100
How to achieve interoperable digital identity across Asian countries
fujie
0
110
Featured
See All Featured
The Power of CSS Pseudo Elements
geoffreycrofte
79
6k
Become a Pro
speakerdeck
PRO
29
5.5k
Raft: Consensus for Rubyists
vanstee
139
7.1k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
Done Done
chrislema
185
16k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
The Straight Up "How To Draw Better" Workshop
denniskardys
237
140k
Into the Great Unknown - MozCon
thekraken
40
2.1k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.2k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
248
1.3M
Dealing with People You Can't Stand - Big Design 2015
cassininazir
367
27k
Mobile First: as difficult as doing things right
swwweet
224
10k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]