One Person Ops - A Retrospective

How I Did Everything Right and Still Screwed It Up

A little history 7 startups 4 jobs as single ops
person 3 jobs as two person ops team

The last startup 300 instances 16 customer facing hostgroups 7
infrastructure hostgroups 15 developers supported 4 years

Taking bets Let’s talk about what things we could have
done incorrectly

SRE/Ops/Etc It’s a full time job within a specific domain
of knowledge. If you’re doing someone else’s job, you can’t do your own.

Is the Internet working? NO OFFICE Cable modem, wireless, and
printer

Automation • Puppet (config mgmt) • Mcollective (orchestration • Releases
process was always code

Everyone owns deploys When one person/team owns deploy, no one
else will be able to deploy.

Sorta micro servicey • Broke application apart early • Didn’t
get it quite right • Close enough is good enough

Tech Stacks • Rich set of standard daemons • Less
application code written • Occasionally didn’t understand the stack

Tech Stacks con’t • Tried not to add stacks •
Hadoop is a major undertaking, so didn’t add it • Each new daemon, service, language, framework is overhead

Consults well with others • Talked over most major decisions
• Kept up with communities • Went to conferences/meetups

Collaborate well w/ others • Good relationship with developers •
Trust and transparency • Sometimes things are going to go wrong

AWS and us-west-2 • The best AWS region • Avoided
new tech early • Not quite sure we did it right

So what did we fail at Onboarding Ops people We
created a system where only one person with 3-4 years experience running it could run it

So what did we fail at It literally took months
of work. That dude talking now? The system was literally all in his head. WTF?!

Onboarding • Simplify the use cases • Guardrails • Reduce
context

Design by committee of 1 • Very focused on the
use cases as I understood them • Still a problem even after consulting with others.

Narrowly avoided burnout • But it was close • Didn’t
take enough vacation • Oncall for too many things

Too flexible? yes. • The flexibility we had required full
context to run correctly • Tightly coupled too

Tweet quota

Poor use of tickets • Dev used tickets heavily •
Ops did not because 1 person • Just use tickets, really

System of record-ish • Data in a few places •
Conflicting flows • Caused largest production problem

Questions? Comments?

One Person Ops - A Retrospective

One Person Ops - A Retrospective

Ramin K

More Decks by Ramin K

Other Decks in Technology

Featured

Transcript

How I Did Everything Right and Still Screwed It Up

A little history 7 startups 4 jobs as single ops

The last startup 300 instances 16 customer facing hostgroups 7

Taking bets Let’s talk about what things we could have

SRE/Ops/Etc It’s a full time job within a specific domain

Is the Internet working? NO OFFICE Cable modem, wireless, and

Automation • Puppet (config mgmt) • Mcollective (orchestration • Releases

Everyone owns deploys When one person/team owns deploy, no one

Sorta micro servicey • Broke application apart early • Didn’t

Tech Stacks • Rich set of standard daemons • Less

Tech Stacks con’t • Tried not to add stacks •

Consults well with others • Talked over most major decisions

Collaborate well w/ others • Good relationship with developers •

AWS and us-west-2 • The best AWS region • Avoided

So what did we fail at Onboarding Ops people We

So what did we fail at It literally took months

Onboarding • Simplify the use cases • Guardrails • Reduce

Design by committee of 1 • Very focused on the

Narrowly avoided burnout • But it was close • Didn’t

Too flexible? yes. • The flexibility we had required full

Tweet quota

Poor use of tickets • Dev used tickets heavily •

System of record-ish • Data in a few places •

Questions? Comments?