Benefits of On-Call
● Hones troubleshooting skills
● Forces you to identify the weak points in your systems
● Teaches you what is and isn’t production-ready
Slide 7
Slide 7 text
Team bonding
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
No content
Slide 12
Slide 12 text
No content
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
PagerDuty
Slide 16
Slide 16 text
No content
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
PagerDuty
Slide 19
Slide 19 text
PagerDuty
New Year’s Eve
Slide 20
Slide 20 text
No content
Slide 21
Slide 21 text
No content
Slide 22
Slide 22 text
No content
Slide 23
Slide 23 text
PagerDuty
New Year’s Eve
Slide 24
Slide 24 text
PagerDuty
New Year’s Eve S3 Outage
Slide 25
Slide 25 text
No content
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
No content
Slide 28
Slide 28 text
No content
Slide 29
Slide 29 text
Me
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
No content
Slide 33
Slide 33 text
No content
Slide 34
Slide 34 text
A totally normal on-call routine
● Don’t leave house except to commute to work
● Clear all non-work appointments
● Cook all meals beforehand
● Have soup on hand
● Don’t sleep
Slide 35
Slide 35 text
No content
Slide 36
Slide 36 text
No content
Slide 37
Slide 37 text
No content
Slide 38
Slide 38 text
No content
Slide 39
Slide 39 text
Be heroes
Slide 40
Slide 40 text
Prepare for battle
Slide 41
Slide 41 text
Naomi Orwin
Writer
Slide 42
Slide 42 text
“Action scenes stop the plot.”
- Naomi Orwin
Slide 43
Slide 43 text
Pages stop the plot of your career
Slide 44
Slide 44 text
No content
Slide 45
Slide 45 text
Miserable on-call professionals
● Have terrible work/life balance
● Are supporting poorly-designed systems
● Feel powerless to solve problems
● Generally hate the role
Slide 46
Slide 46 text
No content
Slide 47
Slide 47 text
Red flag:
too few owning too much
Slide 48
Slide 48 text
Centralia infrastructure
Slide 49
Slide 49 text
No content
Slide 50
Slide 50 text
No content
Slide 51
Slide 51 text
No content
Slide 52
Slide 52 text
Red flag:
bandaids
Slide 53
Slide 53 text
No content
Slide 54
Slide 54 text
● Bump thresholds
● Snooze pages
● Delays
Slide 55
Slide 55 text
Red flag:
no visibility
Slide 56
Slide 56 text
Systems visibility
Slide 57
Slide 57 text
Team visibility
Slide 58
Slide 58 text
Too many pages
Slide 59
Slide 59 text
Average # of weekly pages during WORST on-call
Slide 60
Slide 60 text
Average # of weekly pages during WORST on-call
Slide 61
Slide 61 text
Average # of weekly pages during WORST on-call
Slide 62
Slide 62 text
Average # of weekly pages during WORST on-call
Slide 63
Slide 63 text
No content
Slide 64
Slide 64 text
Average # of weekly pages during BEST on-call
Slide 65
Slide 65 text
Average # of weekly pages during BEST on-call
Slide 66
Slide 66 text
Average # of weekly pages during BEST on-call
Slide 67
Slide 67 text
How do we get there?
Slide 68
Slide 68 text
Notification cleanup
Slide 69
Slide 69 text
Actionable alerts
Slide 70
Slide 70 text
Actionable Alerts
● Something breaks
● Customers notice
● I am the best person to fix it
● I need to fix it immediately
Slide 71
Slide 71 text
Cluster alerts
Slide 72
Slide 72 text
No content
Slide 73
Slide 73 text
Devs on-call
Slide 74
Slide 74 text
No content
Slide 75
Slide 75 text
“If a developer is good, being ‘on
call’ just means having to fix other
people’s problems and
inconsequential stuff on Sat.”
- dev on Twitter
Slide 76
Slide 76 text
“Fixed this for him! ‘Put your
developers on call. You’ll be
surprised by how quickly they go
work for someone that isn’t an
#$%.’”
- dev on Twitter
Slide 77
Slide 77 text
“If your org has change control
working properly, if code breaks,
the jr sysadmin should simply roll
back the update as documented.”
- dev on Twitter
Slide 78
Slide 78 text
The right tool
Slide 79
Slide 79 text
Work together
Slide 80
Slide 80 text
No content
Slide 81
Slide 81 text
Start small
Slide 82
Slide 82 text
No content
Slide 83
Slide 83 text
Your people will burn out before
your company does
Slide 84
Slide 84 text
No content
Slide 85
Slide 85 text
No content
Slide 86
Slide 86 text
No content
Slide 87
Slide 87 text
Where does that leave #oncallselfie?
Slide 88
Slide 88 text
No content
Slide 89
Slide 89 text
No content
Slide 90
Slide 90 text
No content
Slide 91
Slide 91 text
No content
Slide 92
Slide 92 text
No content
Slide 93
Slide 93 text
No content
Slide 94
Slide 94 text
No content
Slide 95
Slide 95 text
No content
Slide 96
Slide 96 text
No content
Slide 97
Slide 97 text
No content
Slide 98
Slide 98 text
No content
Slide 99
Slide 99 text
“Why are you getting paged so much?”
Slide 100
Slide 100 text
Thanks!
@alicegoldfuss
Special Thanks:
PagerDuty
VictorOps
oncallselfies.com
All of you