Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up
for free
production: an owner's manual
Igor Wiedler
April 23, 2018
Programming
0
79
production: an owner's manual
from exec(ut) 2018
Igor Wiedler
April 23, 2018
Tweet
Share
More Decks by Igor Wiedler
See All by Igor Wiedler
igorw
3
830
igorw
0
76
igorw
0
150
igorw
0
150
igorw
1
190
igorw
0
150
igorw
1
170
igorw
1
680
igorw
7
1.1k
Other Decks in Programming
See All in Programming
yumemi
1
110
korosuke613
2
260
coa00
2
160
grapecity_dev
0
180
rarous
0
170
aftiopk
0
130
ybrliiu
0
100
kimyan
3
480
gernotstarke
0
390
tooppoo
0
200
rince
3
250
grapecity_dev
0
180
Featured
See All Featured
danielanewman
200
20k
brad_frost
157
6.4k
lauravandoore
437
28k
michaelherold
225
8.5k
cromwellryan
104
6.2k
caitiem20
308
17k
eitanlees
112
10k
kastner
54
1.9k
zenorocha
296
40k
tmm1
61
9.4k
destraynor
222
47k
smashingmag
230
18k
Transcript
production: an owner's manual
hello!
broken computers
None
getting sidetracked now so sorry* * not sorry
None
None
None
back to serious business
!
None
a production system is a system that serves real users
the goal of operations is to ensure services are reliable
in order to provide a good user experience
None
failure
app
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber
app linux kernel the cloud
None
• cosmic rays • disk failure • power outages •
software bugs • ...
entropy
None
capacity
None
None
None
cascading failure
None
system design
redundancy
"
scale
None
"
p1 m3 c1 m2 m1 p2 c2
data storage
"
"
protocols
None
monitoring
many components many req/s
None
measure all the things?
✅ ⏱
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
golden signals • latency • traffic • errors • saturation
0 - 50 [1620]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎ (74.55%) 50 - 100 [ 447]: ∎∎∎∎∎∎∎∎∎∎ (20.57%) 100 - 150 [ 49]: ∎ (2.25%) 150 - 200 [ 15]: (0.69%) 200 - 250 [ 15]: (0.69%) 250 - 300 [ 10]: (0.46%) 300 - 350 [ 6]: (0.28%) 350 - 400 [ 1]: (0.05%) 400 - 450 [ 0]: (0.00%) 450 - 500 [ 4]: (0.18%)
golden signals • latency • traffic • errors • saturation
saturation traffic latency errors
None
humans
None
oops, deleted the database
bad human!
why does this button even exist?
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber humans
app linux kernel cpu dram disk network power supply switches
load balancer dns submarine cables routers fiber humans h u m a n s
epic failure is almost always systemic
failure
recap
• a production system serves real users • users like
things that work and are fast • epic failure is almost always systemic
thx @igorwhilefalse
None