Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
production: an owner's manual
Igor Wiedler
April 23, 2018
Programming
0
96
production: an owner's manual
from exec(ut) 2018
Igor Wiedler
April 23, 2018
Tweet
Share
More Decks by Igor Wiedler
See All by Igor Wiedler
Wide Event Analytics (LISA19)
igorw
3
850
a day in the life of a request
igorw
0
85
The Power of 2
igorw
0
160
LISP 1.5 Programmer's Manual: A Dramatic Reading
igorw
0
190
The Moral Character of Software
igorw
1
200
interdisciplinary computing (domcode)
igorw
0
170
miniKanren (clojure berlin)
igorw
1
180
End the war on tabs (phpnw14)
igorw
1
780
Lisp (laraconeu)
igorw
7
1.2k
Other Decks in Programming
See All in Programming
microCMS × Shopifyで、ECサイトがリニューアル後急成長した話
microcms
0
470
WindowsコンテナDojo:第2回 Windowsコンテナアプリのビルド、公開、デプロイ
oniak3ibm
PRO
0
150
Airflow1=>Airflow2へのupgrade 事例紹介
reizist
0
120
확장 가능한 테라폼 코드 관리 (Scalable Terraform Code Management)
posquit0
1
320
Composing an API with Kotlin (Kotlin Dev Day 2022)
zsmb
0
270
mrubyを1300円のボードで動かそう
yuuu
0
190
Kotlin 最新動向2022 #tfcon #techfeed
ntaro
1
1.1k
The future of trust stores in Python
sethmlarson
0
180
Why declarative UI frameworks?
tkuenneth
0
120
GraphQL+KMM開発でわかったこと / What we learned from GraphQL+KMM development
kubode
0
130
Loom is Blooming
josepaumard
3
550
Unity Localization で多言語対応実装しよう / xrdnk-yokohamaunity-lt10-20220513
xrdnk
0
140
Featured
See All Featured
Design by the Numbers
sachag
271
17k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
3
440
How STYLIGHT went responsive
nonsquared
85
3.9k
YesSQL, Process and Tooling at Scale
rocio
157
12k
Six Lessons from altMBA
skipperchong
14
1.3k
A Philosophy of Restraint
colly
192
14k
The Web Native Designer (August 2011)
paulrobertlloyd
74
1.9k
Art, The Web, and Tiny UX
lynnandtonic
280
17k
Writing Fast Ruby
sferik
612
57k
Side Projects
sachag
449
37k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
12
890
JazzCon 2018 Closing Keynote - Leadership for the Reluctant Leader
reverentgeek
172
8.3k
Transcript
production: an owner's manual
hello!
broken computers
None
getting sidetracked now so sorry* * not sorry
None
None
None
back to serious business
!
None
a production system is a system that serves real users
the goal of operations is to ensure services are reliable
in order to provide a good user experience
None
failure
app
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber
app linux kernel the cloud
None
• cosmic rays • disk failure • power outages •
software bugs • ...
entropy
None
capacity
None
None
None
cascading failure
None
system design
redundancy
"
scale
None
"
p1 m3 c1 m2 m1 p2 c2
data storage
"
"
protocols
None
monitoring
many components many req/s
None
measure all the things?
✅ ⏱
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
0 - 50 [1620]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (74.55%) 50 - 100 [ 447]: ∎∎∎∎∎∎∎∎∎∎ (20.57%) 100 - 150 [ 49]: ∎ (2.25%) 150 - 200 [ 15]: (0.69%) 200 - 250 [ 15]: (0.69%) 250 - 300 [ 10]: (0.46%) 300 - 350 [ 6]: (0.28%) 350 - 400 [ 1]: (0.05%) 400 - 450 [ 0]: (0.00%) 450 - 500 [ 4]: (0.18%)
golden signals • latency • traffic • errors • saturation
saturation traffic latency errors
None
humans
None
oops, deleted the database
bad human!
why does this button even exist?
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber humans
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber humans h u m a n s
epic failure is almost always systemic
failure
recap
• a production system serves real users • users like
things that work and are fast • epic failure is almost always systemic
thx @igorwhilefalse
None