Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Get Up Again (Over and Over): Learning and Rel...

Get Up Again (Over and Over): Learning and Relearning with Chef

chefconf 2014 talk

Yvonne Lam

April 17, 2014
Tweet

More Decks by Yvonne Lam

Other Decks in Technology

Transcript

  1. Me (@yvonnezlam) • Work in the Release Engineering group at

    Chef. • Past roles in alphabetical order: {build, deployment, ops, perf/QA, release} engineer • Past product domains: NLP/data mining, mobile, scientific programming, HPC (compilers, crash codes, RAS)
  2. Continuous deployment expands the range of what counts as “production

    infrastructure” • “Dev is production” - Sascha Bates, “Doom Your Chef” (https://www.youtube.com/watch? v=pHmU0aNkENc at 14:37) • Production = “stuff people use” • Continuous deployment makes it very visible that production depends on your delivery pipeline.
  3. To know whether an experiment delivers value, you have to

    run it in production • Pretty clear when you think of it in terms of traditional customer-facing features. • It’s also true for infrastructure.
  4. Config management adoption + devops means… • Lots of people

    are doing lots of things with production infrastructure for the first time. (Experiments!) • We’re going to have a lot to refactor/replace (and this is ok! But it involves more experiments.) • Hence “learning and relearning.”
  5. The kinds of experiments I am talking about • Automating

    processes that were not previously automated. • Tooling that changes dev workflow. • Introducing new components to an existing prod environment. • Building/updating/replacing CI for infrastructure code.
  6. What does this all mean? • Get good at managing

    experiments involving production infrastructure of various kinds.
  7. This talk is about making sure you have the right

    lever. • Not about how to make prod safe for experiments and/or testing (although you should totally do those things!) • “Why did we even let it get this far?”
  8. How? • When you are doing an experiment, remember that

    you are doing an experiment. • Limit your variables. • Plan for negative results. • Clean up after you’re done.
  9. Remember you are doing an experiment. • An experiment is

    intended to determine whether you should proceed further along certain lines. It is not necessarily supposed to live forever in its current form. • “Ship the prototype!”
  10. Talking about experiments • Remind people that they are looking

    at an experiment. Constantly. • “Nerdherding on the Frontier”, Adele Shakhal (http://adeleshakal.files.wordpress.com/2014/03/ nerdherdingonthefrontier-adeleshakal1.pdf)
  11. Limit your variables • Cross-platform experiments are at least two

    experiments, not one. • Automating something you don’t already know how to do is at least two experiments, not one. • Variables include people.
  12. Sidebar: Manage software dependencies • Lock your dependencies. • Consider

    hosting your own dependencies. • How hard is it to update a base dependency? • Versioning for people is different from versioning for machines.
  13. “Everything needs to eat the broccoli.” • Paraphrased from something

    Adam Jacob said in our heartbleed postmortem. • Maintenance versions need to be able to built with security patches.
  14. Variables include people • You are telling your users what

    you are doing, right? • The greater the number of people that get involved in an experiment, the more you need clear interfaces for training, support, bug fixes… • People want their tools to “just work”
  15. Create interfaces that hide iteration • “When we first brought

    Deployinator online, it was just a web frontend to the shell scripts that moved everything in the right place. What we gained by putting a screen in front of it was the ability to iterate the backend without changing the experience for people deploying.” — http:// codeascraft.com/2010/05/20/ quantum-of-deployment/
  16. Automating something you don’t already know how to do is

    two experiments, not one • Automation != a simpler user interface for people • Build a small one manually first. Really.
  17. Be prepared for negative results • In this case, “negative”

    = “no clear effect on value delivered” • Sometimes this is a good thing.
  18. Clean up when you’re done • Context matters. (“We do

    live here!”) • Infrastructure that goes nowhere interferes with legibility of the system as a whole and makes it harder to manage. (the wiki effect)
  19. “My experiment was a success, now what?” • You’ve added

    a thing that delivers value, yay for you! • Migration from an old system to a new system takes longer than anyone thinks it should. • Running an old system and a new system impairs legibility, but also provides benefit.