Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
JBD
May 17, 2019
Technology
3
1.6k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.2k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Serverless Containers
rakyll
1
270
Critical Path Analysis
rakyll
0
680
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.3k
Other Decks in Technology
See All in Technology
AzureでのIaC - Bicep? Terraform? それ早く言ってよ会議
torumakabe
1
580
Greatest Disaster Hits in Web Performance
guaca
0
270
会社紹介資料 / Sansan Company Profile
sansan33
PRO
15
400k
SREが向き合う大規模リアーキテクチャ 〜信頼性とアジリティの両立〜
zepprix
0
460
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
5
1.6k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
68k
[CV勉強会@関東 World Model 読み会] Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models (Mousakhan+, NeurIPS 2025)
abemii
0
140
外部キー制約の知っておいて欲しいこと - RDBMSを正しく使うために必要なこと / FOREIGN KEY Night
soudai
PRO
12
5.6k
モダンUIでフルサーバーレスなAIエージェントをAmplifyとCDKでサクッとデプロイしよう
minorun365
4
210
SRE Enabling戦記 - 急成長する組織にSREを浸透させる戦いの歴史
markie1009
0
130
AWS Network Firewall Proxyを触ってみた
nagisa53
1
240
FinTech SREのAWSサービス活用/Leveraging AWS Services in FinTech SRE
maaaato
0
130
Featured
See All Featured
The World Runs on Bad Software
bkeepers
PRO
72
12k
Principles of Awesome APIs and How to Build Them.
keavy
128
17k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.2k
Ethics towards AI in product and experience design
skipperchong
2
200
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
0
2.3k
Visualization
eitanlees
150
17k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
4 Signs Your Business is Dying
shpigford
187
22k
GraphQLの誤解/rethinking-graphql
sonatard
74
11k
Fireside Chat
paigeccino
41
3.8k
The SEO Collaboration Effect
kristinabergwall1
0
350
Scaling GitHub
holman
464
140k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]