Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Puppet in Production

Puppet in Production

10 things we learned the hard way.

Matthew Finlayson

September 28, 2012
Tweet

More Decks by Matthew Finlayson

Other Decks in Programming

Transcript

  1. Matthew Finlayson • Staff Engineer - Jive Software • @savagegus

    on twitter • matthew.finlayson@jivesoftw are.com • Feel free to find me during the conference or ping me online after. Tuesday, April 30, 13
  2. Context • This project started 3 Years Ago • Homemade

    VMware, NetApp, and Cisco cloud • Automating Infrastructure & Application • Managing ~10,000 production clients • All CentOS 5 / 6 based • All VMware for the Jive application • All Bare Metal for infrastructure • We’re a Java Shop ® Tuesday, April 30, 13
  3. Disclaimer • This is an embarrassing talk • Each item

    starts with how we tried doing it • Ending with how we’re doing it now • This means I have to tell you all the dumb things we did • Be kind Tuesday, April 30, 13
  4. What not to do and how not to do it

    Tuesday, April 30, 13
  5. • We did everything at once • Tarred up Java

    Apps • Used puppet to run: ./configure; make; make install; • ~40 modules, many interdependent • Lots of hardcoding • Used puppet to run scripts to collect input to use in other modules • The list of sins goes on. Tuesday, April 30, 13
  6. • Services, packages, and OS configurations are the low hanging

    fruit • No Package == No Puppet • Build in layers, start with uniform piece parts • Think twice about dynamic NFS shares (mount is weird.) • If you’re using NIS it’s time to start cutting yourself. Tuesday, April 30, 13
  7. • Had a preexisting system of record • Java application

    that acted as a dynamic configuration and management database • This system is responsible for provisioning, configuration, and management of our application • Started with custom facts and generated nodes.pp files • Tried regex in conjunction with hostnames Tuesday, April 30, 13
  8. • Finally turned to the External Node Classifier • puppet

    -> bash -> curl -> Java! • Used ENC to specify classes with basic hierarchies • Passed configuration variables down to modules • Finally graduated to Hiera and ENC Tuesday, April 30, 13
  9. • Started with a puppet and webrick • Moved to

    puppet and passenger • At 500 clients we had a (serious) problem • Tried scaling up (added ram and cpu) • Tried tuning apache • Reduced puppet run frequency • Tried prayer and heavy drinking Tuesday, April 30, 13
  10. • On the client side there were problems too •

    CentOS 5 was shipping ruby 1.8.5 • Constant hung puppet processes were spiking VM’s and causing changes to be delayed • We moved to Enterprise Ruby 1.8.7 of of /usr/local Tuesday, April 30, 13
  11. • 2 puppet masters on bare metal • 1 puppet

    certificate authority on bare metal • puppet.domainname is a VIP • F5 routes ca and worker requests • Round Robin load balancing across workers • As load gets worse we just add workers Tuesday, April 30, 13
  12. • As our application environments changed we added modules •

    Common modules gained complexity • Interactions got more complex • Old timey non-deterministic problems got bad • We got scared to add functionality Tuesday, April 30, 13
  13. • We’re running on 2 puppet masters • We have

    many more discrete environments (different applications, unique modules, SLA’s, etc) • Environments provide separation between each environment AND between staging and production Tuesday, April 30, 13
  14. • This means each puppet environment we define specifies its

    own manifests and modules • For applications this is a practical split of different file system locations • For staging and production environments these are also different filesystem locations but they reflect different scm branches Tuesday, April 30, 13
  15. • Always write modules that switch on OS type and

    version. • In a homogenous environment it’ll feel like a waste. • You won’t need the flexibility until you do. • (We got burned between CentOS 5 and 6) Tuesday, April 30, 13
  16. • Use classes • Parameters are your friend • You

    can logically segment functionality • Set default values for unset variables • Top level manifest directories should be for bootstrapping variables, defaults should make everything else safe. Tuesday, April 30, 13
  17. • Use puppet-module • It’s good to have conventions •

    New developers will be familiar with what you build • Gives you a fighting chance of reusing modules from the forge • You can always be a contributor Tuesday, April 30, 13
  18. • If you need to use file, try to use

    templates, plenty of times this was handy, less refactoring later. • If you touch a file with puppet put a header on it. Other ops will appreciate it. Tuesday, April 30, 13
  19. • We started with the lather rinse repeat method. Write

    a module, tweak, run, repeat. • Nuked /etc/resolv.conf, couldn’t find the puppet master to fix it. ssh loop :( • Split into staging and prod, now we nuked staging instead • Since we didn’t always develop on the same hardware / OS we couldn’t even just run locally Tuesday, April 30, 13
  20. • Split out environments to staging and production • Started

    testing locally with virtual box and vagrant • We have a CentOS 6.2 vagrant box • We have a set of vagrant files that spin up a fresh copy and applies the puppet modules • This prevents silly mistakes and serious ones from stopping everyone Tuesday, April 30, 13
  21. • Changes in SCM with no idea why they were

    made • People would push puppet syntax errors (way too often) • “That’s a simple change, one second” • No enforced style guides • We wrote puppet doc but never really used it • “The change is in SCM, why don’t we see it in prod?” Tuesday, April 30, 13
  22. • Pre-commit hook for JIRA issue • Post-commit puppet syntax

    is run against all modules • Post-commit puppet-lint is run against all modules • Post-commit puppet-doc is generated • Post-commit changes are pushed to staging automatically Tuesday, April 30, 13
  23. • Started with manually replicating changes from staging to production

    • Moved to mavenized builds of tarballs (i know, i know) • Production is on a different network than SCM • Tried rsync Tuesday, April 30, 13
  24. • Finally moved to capistrano with multi-environment support (capistrano-ext) •

    Using :stages to push different branches to different environments • Deploy over scp from scm • A release is pushed to a date stamped directory • The ‘current’ symlink is updated to point to that release • Rollback just points ‘current’ to the previous date stamped directory • Driven through Jenkins Tuesday, April 30, 13
  25. • Started with a production environment • Changes and new

    code went there • (i think we know where this is going) • FINALLY we had trunk going from dev to staging to production Tuesday, April 30, 13
  26. • Trunk: gets pushed to production • Staging (branch): gets

    pushed to staging • Dev (branch): gets pushed to dev • Changes to the source tree get pushed post-commit • Latest code is always out there, no skipped deployments, no guessing what’s there by revision history, no more broken dreams. Tuesday, April 30, 13
  27. • Puppet Users - Google Groups • Pro Puppet by

    Turnbull & McCune • http://www.planetpuppet.org • http://www.dzone.com/mz/devops • #puppet on irc.freenode.net • http://dev2ops.org Tuesday, April 30, 13
  28. seriously, when you don’t know how or if you can

    do it github. github Tuesday, April 30, 13