Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Servers are doomed to fail
Search
JBD
May 17, 2019
Technology
3
1.5k
Servers are doomed to fail
JBD
May 17, 2019
Tweet
Share
More Decks by JBD
See All by JBD
eBPF in Microservices Observability at eBPF Day
rakyll
1
2.1k
eBPF in Microservices Observability
rakyll
1
1.7k
OpenTelemetry at AWS
rakyll
1
1.9k
Debugging Code Generation in Go
rakyll
5
1.6k
Are you ready for production?
rakyll
8
2.9k
Serverless Containers
rakyll
1
260
Critical Path Analysis
rakyll
0
630
Monitoring and Debugging Containers
rakyll
2
1.1k
CPDD
rakyll
0
4.2k
Other Decks in Technology
See All in Technology
OpenTelemetry が拡げる Gemini CLI の可観測性
phaya72
2
930
ビズリーチ求職者検索におけるPLMとLLMの活用 / Search Engineering MEET UP_2-1
visional_engineering_and_design
1
180
RDS の負荷が高い場合に AWS で取りうる具体策 N 連発/a-series-of-specific-countermeasures-available-on-aws-when-rds-is-under-high-load
emiki
7
4.4k
OSSで50の競合と戦うためにやったこと
yamadashy
3
620
CNCFの視点で捉えるPlatform Engineering - 最新動向と展望 / Platform Engineering from the CNCF Perspective
hhiroshell
0
110
現場データから見える、開発生産性の変化コード生成AI導入・運用のリアル〜 / Changes in Development Productivity and Operational Challenges Following the Introduction of Code Generation AI
nttcom
1
400
「魔法少女まどか☆マギカ Magia Exedra」の多様なバトルの開発を柔軟かつ効率的に実現するためのPure C#とUnityの分離について
gree_tech
PRO
0
150
コンテキストエンジニアリング入門〜AI Coding Agent作りで学ぶ文脈設計〜
kworkdev
PRO
3
2k
事業開発におけるDify活用事例
kentarofujii
4
1.1k
Building a cloud native business on open source
lizrice
0
130
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
3
20k
Digitization部 紹介資料
sansan33
PRO
1
5.6k
Featured
See All Featured
Being A Developer After 40
akosma
91
590k
The Illustrated Children's Guide to Kubernetes
chrisshort
49
51k
Making the Leap to Tech Lead
cromwellryan
135
9.6k
How to Ace a Technical Interview
jacobian
280
24k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Gamification - CAS2011
davidbonilla
81
5.5k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
140
34k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.7k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.7k
Large-scale JavaScript Application Architecture
addyosmani
514
110k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Transcript
Servers are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Serverless is also doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Systems are doomed to fail Jaana B. Dogan
[email protected]
@rakyll
Is failure OK? Is failure an unexpected case?
Failure is not an exception. Systems change all the time.
“I haven’t touched the code for a century, it should
just work.” Said no one ever.
Failure is expected. Yes, it is.
None
@rakyll monitoring debugging postmortem
Monitoring is about saying if something is broken.
“99.99% of the requests should return in 100ms.”
@rakyll
@rakyll
Debugging
Debugging is collaborative.
Debugging comes in flavors. Logs Traces Metrics ...
Postmortems
Postmortems
Postmortems
Blameless? Focus on identifying problems.
Collaboration Design for collaboration.
Design for failure Set SLOs, plan for instrumentation, plan for
debugging.
Cross-stack debugging Accountability across stack with high cardinality data. speakerdeck.com/rakyll/rpc-metrics-at-google
Correlation Jump from monitoring/debugging data to data.
On-call debugging Jump from distributed tracing data to on-call information.
who to page?
Dynamic collection Capability to enable more collection in production when
needed.
Continuous collection Continuously collect signals, generate fleet-wide analysis reports.
Introspection Introspection pages provided from the services.
@rakyll monitoring debugging postmortem
Thank you Jaana B. Dogan Google
[email protected]