Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.4k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2k
eBPF in Microservices Observability
rakyll
1
1.6k
OpenTelemetry at AWS
rakyll
1
1.8k
Debugging Code Generation in Go
rakyll
5
1.5k
Are you ready for production?
rakyll
8
2.6k
Serverless Containers
rakyll
1
230
Critical Path Analysis
rakyll
0
470
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.1k
Other Decks in Technology
See All in Technology
AOAI Dev Day - Opening Session
yoshidashingo
2
440
成長期に歩みを止めないための創業期の開発文化形成
mayah
6
420
AWSでRAGを作る法方
sonoda_mj
1
140
Datadog Cloud SIEMを使ってAWS環境の脅威を可視化した話/lifeistech-datadog-cloud-siem
gidajun
0
480
Amazon FSx for NetApp ONTAPのパフォーマンスチューニング要素をまとめてみた #cm_odyssey #devio2024
non97
0
220
Scaling Technical Excellence at 104: Evolution in AWS and Developer Empowerment
scotthsieh825
1
150
コンテナ・K8s研修 - 後半 Kubernetes 基礎&ハンズオン【MIXI 24新卒技術研修】
mixi_engineers
PRO
1
120
データベース研修 DB基礎【MIXI 24新卒技術研修】
mixi_engineers
PRO
0
210
ソフトウェアエンジニアリングの知見を活かして データ基盤をいい感じにする on Snowflake [MIERUNE BBQ #10]
mtpooh
2
150
[2024最新版]AWS Control Towerを使ったセキュアなマルチアカウント環境の作り方
hiashisan
0
270
データ分析を支える技術 生成AI再入門
ishikawa_satoru
0
380
サービス開発を前に進めるために 新米リードエンジニアが 取り組んだこと / Steps Taken by a Novice Lead Engineer to Advance Service Development
nologyance
0
180
Featured
See All Featured
How to name files
jennybc
67
96k
Debugging Ruby Performance
tmm1
71
11k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
12
3.8k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
155
14k
It's Worth the Effort
3n
181
27k
Mobile First: as difficult as doing things right
swwweet
219
8.8k
The World Runs on Bad Software
bkeepers
PRO
63
11k
Code Review Best Practice
trishagee
58
16k
Optimising Largest Contentful Paint
csswizardry
18
2.6k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
35
6.3k
Done Done
chrislema
179
15k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
129
32k
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]