Upgrade to Pro — share decks privately, control downloads, hide ads and more …

One Person Ops - A Retrospective

Ramin K
August 19, 2016

One Person Ops - A Retrospective

How I did everything mostly right and still screwed it up. Well managed to make new a different mistakes.

Ramin K

August 19, 2016
Tweet

More Decks by Ramin K

Other Decks in Technology

Transcript

  1. A little history 7 startups 4 jobs as single ops

    person 3 jobs as two person ops team
  2. The last startup 300 instances 16 customer facing hostgroups 7

    infrastructure hostgroups 15 developers supported 4 years
  3. SRE/Ops/Etc It’s a full time job within a specific domain

    of knowledge. If you’re doing someone else’s job, you can’t do your own.
  4. Sorta micro servicey • Broke application apart early • Didn’t

    get it quite right • Close enough is good enough
  5. Tech Stacks • Rich set of standard daemons • Less

    application code written • Occasionally didn’t understand the stack
  6. Tech Stacks con’t • Tried not to add stacks •

    Hadoop is a major undertaking, so didn’t add it • Each new daemon, service, language, framework is overhead
  7. Consults well with others • Talked over most major decisions

    • Kept up with communities • Went to conferences/meetups
  8. Collaborate well w/ others • Good relationship with developers •

    Trust and transparency • Sometimes things are going to go wrong
  9. AWS and us-west-2 • The best AWS region • Avoided

    new tech early • Not quite sure we did it right
  10. So what did we fail at Onboarding Ops people We

    created a system where only one person with 3-4 years experience running it could run it
  11. So what did we fail at It literally took months

    of work. That dude talking now? The system was literally all in his head. WTF?!
  12. Design by committee of 1 • Very focused on the

    use cases as I understood them • Still a problem even after consulting with others.
  13. Narrowly avoided burnout • But it was close • Didn’t

    take enough vacation • Oncall for too many things
  14. Too flexible? yes. • The flexibility we had required full

    context to run correctly • Tightly coupled too
  15. Poor use of tickets • Dev used tickets heavily •

    Ops did not because 1 person • Just use tickets, really
  16. System of record-ish • Data in a few places •

    Conflicting flows • Caused largest production problem