Why your software engineers need to get better at operations,
and how to do it.
DevOps for Developers:
Slide 4
Slide 4 text
“Dear operations people, learn to be more
like software engineers.”
Love, DevOps (2009-2016)
Slide 5
Slide 5 text
“Dear software engineers: your turn.
Time to get better at ops.”
Love, Everyone in Tech
Slide 6
Slide 6 text
This is not optional,
this is not “nice-to-have”
This is table stakes.
Slide 7
Slide 7 text
What is operations?
Operations is the constellation of your org’s technical skills, practices,
and cultural values around designing, building and maintaining systems,
shipping software, and solving problems with technology.
Slide 8
Slide 8 text
Operations is a social contract.
Slide 9
Slide 9 text
Do you need an “ops team”?
Do you need quality operations engineering skills and culture?
¯\_(ϑ)_/¯
YES.
Slide 10
Slide 10 text
So you have an Ops Org …
Slide 11
Slide 11 text
Your Mission
1. Support your people in developing new skill sets
2. Express institutional value (and mean it)
Slide 12
Slide 12 text
Software engineers need to get better at ops.
(And they should WANT TO!! Ops is like a superpower!!!)
Slide 13
Slide 13 text
Developing new skill sets
Slide 14
Slide 14 text
Engineers should be on call
for their own services.
Slide 15
Slide 15 text
Common protests:
* learned helplessness
* fear of breaking things
* strategic incompetence
* “my time is too valuable!”
Slide 16
Slide 16 text
• Guard your people’s time and sleep
• No hero complexes. No martyrs.
• Don’t over-page. Align engineering pain with customer pain
• Roll up non-urgent alerts for daytime hours
• Your most valuable paging alerts are end-to-end checks on critical
code paths.
Corollary: on-call must not be hell.
Slide 17
Slide 17 text
Software engineers
should deploy their own code.
Slide 18
Slide 18 text
Build guard-rails,
not walls
Feedback needs to be fast to be effective
Slide 19
Slide 19 text
The most powerful weapon in your arsenal
is always cause and effect.
Slide 20
Slide 20 text
Pair your SWEs with ops/DBA for
debugging, oncall
“cool! let’s sit down and
figure this out together,
and I’ll show you how to
do it next time!”
Slide 21
Slide 21 text
Your eng teams should share the same review
processes, tasks and tools.
Slide 22
Slide 22 text
Emphasize ops feedback in early design phase.
What are the reliability requirements? How do
we distribute load or degrade gracefully?
Are we reusing components that are already
known & supported as much as possible?
Who supports this service, how is it going to
fail, what are the ripple effects when it does?
What instrumentation and metrics will we need?
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
The cost and pain of developing software is approximately zero
compared to the operational cost of maintaining it over time.
h/t @mcfunley, “choose boring technology”
Probe every software engineering candidate
for their ops experience & attitude.
… yep, even FE/mobile devs!
Slide 29
Slide 29 text
• “Tell me about the last time you caused a production outage.”
• “What are your favorite tools for visibility, instrumentation, and
debugging?”
• “How would you design a deploy process?”
• “You developed service $x, and latency is 5x higher today than
yesterday. How do you start debugging the problem?”
• “What happens when you type “google.com” into a browser?
Good operational questions for SWEs
Slide 30
Slide 30 text
Good engineers should be able to
communicate in great detail
everything that SUCKS about their
favorite technologies.
Slide 31
Slide 31 text
Do they expect the network to be reliable, disks
to be fast, databases to respond, retries to
succeed …
Signals …
How do they react to the idea of being on call
for their own services?
Are they overly clever? Ugh.
Slide 32
Slide 32 text
“Operations is valued here.”
you are signaling …
Slide 33
Slide 33 text
• Solicit regular feedback from peers, ops, support teams
• Ask questions about relevant operational skills:
• “Who would you most like to be paired with on call? Least?”
• “Who do you ask for help when you’re completely stumped?”
• “Whose code would you be least willing to maintain?”
• Include this feedback every cycle, it should not be a surprise.
Performance reviews
Slide 34
Slide 34 text
Senior software engineers should be reasonably good at these things.
So if they are not, don’t promote them.
Operations engineering is about making systems
maintainable, reliable, and comprehensible.
Slide 35
Slide 35 text
You need to actively solicit this feedback
by asking different questions.
It is much, much harder to recognize and reward
operational excellence than shipping shiny features.
Slide 36
Slide 36 text
Your operational priorities must be clearly
communicated by management,
details left up to the engineers/teams.
Slide 37
Slide 37 text
The patterns you call out and celebrate in your culture
will get repeated.
Slide 38
Slide 38 text
In conclusion …
Slide 39
Slide 39 text
Yes, you need an ops team,
IF you have hard operational problems.
You should try to not have hard operational problems.
Slide 40
Slide 40 text
Needing a dedicated
operations engineering
team is a sign of
success.
Good job!