Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.5k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.8k
Serverless Containers
rakyll
1
250
Critical Path Analysis
rakyll
0
610
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.2k
Other Decks in Technology
See All in Technology
How to Quickly Call American Airlines®️ U.S. Customer Care : Full Guide
flyaahelpguide
0
130
American airlines ®️ USA Contact Numbers: Complete 2025 Support Guide
airhelpsupport
0
390
整頓のジレンマとの戦い〜Tidy First?で振り返る事業とキャリアの歩み〜/Fighting the tidiness dilemma〜Business and Career Milestones Reflected on in Tidy First?〜
bitkey
3
17k
United™️ Airlines®️ Customer®️ USA Contact Numbers: Complete 2025 Support Guide
flyunitedguide
0
390
[ JAWS-UG千葉支部 x 彩の国埼玉支部 ]ムダ遣い卒業!FinOpsで始めるAWSコスト最適化の第一歩
sh_fk2
2
130
Operating Operator
shhnjk
1
620
スタートアップに選択肢を 〜生成AIを活用したセカンダリー事業への挑戦〜
nstock
0
260
タイミーのデータモデリング事例と今後のチャレンジ
ttccddtoki
6
2.5k
サイバーエージェントグループのSRE10年の歩みとAI時代の生存戦略
shotatsuge
3
380
Delegating the chores of authenticating users to Keycloak
ahus1
0
160
赤煉瓦倉庫勉強会「Databricksを選んだ理由と、絶賛真っ只中のデータ基盤移行体験記」
ivry_presentationmaterials
2
380
ビジネス職が分析も担う事業部制組織でのデータ活用の仕組みづくり / Enabling Data Analytics in Business-Led Divisional Organizations
zaimy
1
190
Featured
See All Featured
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
50
5.5k
Writing Fast Ruby
sferik
628
62k
Rebuilding a faster, lazier Slack
samanthasiow
83
9.1k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
54k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
357
30k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
What's in a price? How to price your products and services
michaelherold
246
12k
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Adopting Sorbet at Scale
ufuk
77
9.5k
How STYLIGHT went responsive
nonsquared
100
5.6k
Typedesign – Prime Four
hannesfritz
42
2.7k
Why You Should Never Use an ORM
jnunemaker
PRO
58
9.4k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]