$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.5k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
650
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.2k
Other Decks in Technology
See All in Technology
小さな判断で育つ、大きな意思決定力 / 20251204 Takahiro Kinjo
shift_evolve
PRO
1
260
Ryzen NPUにおけるAI Engineプログラミング
anjn
0
180
Claude Code はじめてガイド -1時間で学べるAI駆動開発の基本と実践-
oikon48
41
24k
モバイルゲーム開発におけるエージェント技術活用への試行錯誤 ~開発効率化へのアプローチの紹介と未来に向けた展望~
qualiarts
0
260
Excelデータ分析で学ぶディメンショナルモデリング ~アジャイルデータモデリングへ向けて~ by @Kazaneya_PR / 20251126
kazaneya
PRO
3
840
Kill the Vibe?Architecture in the age of AI
stoth
1
170
Databricksによるエージェント構築
taka_aki
1
110
MS Ignite 2025で発表されたFoundry IQをRecap
satodayo
3
230
Oracle Database@Google Cloud:サービス概要のご紹介
oracle4engineer
PRO
0
640
AI駆動開発によるDDDの実践
dip_tech
PRO
0
270
Modern Data Stack大好きマンが語るSnowflakeの魅力
sagara
0
270
命名から始めるSpec Driven
kuruwic
3
810
Featured
See All Featured
Site-Speed That Sticks
csswizardry
13
990
KATA
mclloyd
PRO
32
15k
Writing Fast Ruby
sferik
630
62k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
140
34k
4 Signs Your Business is Dying
shpigford
186
22k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.3k
Why You Should Never Use an ORM
jnunemaker
PRO
60
9.6k
How GitHub (no longer) Works
holman
316
140k
Visualization
eitanlees
150
16k
Being A Developer After 40
akosma
91
590k
The Language of Interfaces
destraynor
162
25k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]