VMware, NetApp, and Cisco cloud • Automating Infrastructure & Application • Managing ~10,000 production clients • All CentOS 5 / 6 based • All VMware for the Jive application • All Bare Metal for infrastructure • We’re a Java Shop ® Tuesday, April 30, 13
starts with how we tried doing it • Ending with how we’re doing it now • This means I have to tell you all the dumb things we did • Be kind Tuesday, April 30, 13
Apps • Used puppet to run: ./configure; make; make install; • ~40 modules, many interdependent • Lots of hardcoding • Used puppet to run scripts to collect input to use in other modules • The list of sins goes on. Tuesday, April 30, 13
fruit • No Package == No Puppet • Build in layers, start with uniform piece parts • Think twice about dynamic NFS shares (mount is weird.) • If you’re using NIS it’s time to start cutting yourself. Tuesday, April 30, 13
that acted as a dynamic configuration and management database • This system is responsible for provisioning, configuration, and management of our application • Started with custom facts and generated nodes.pp files • Tried regex in conjunction with hostnames Tuesday, April 30, 13
-> bash -> curl -> Java! • Used ENC to specify classes with basic hierarchies • Passed configuration variables down to modules • Finally graduated to Hiera and ENC Tuesday, April 30, 13
puppet and passenger • At 500 clients we had a (serious) problem • Tried scaling up (added ram and cpu) • Tried tuning apache • Reduced puppet run frequency • Tried prayer and heavy drinking Tuesday, April 30, 13
CentOS 5 was shipping ruby 1.8.5 • Constant hung puppet processes were spiking VM’s and causing changes to be delayed • We moved to Enterprise Ruby 1.8.7 of of /usr/local Tuesday, April 30, 13
certificate authority on bare metal • puppet.domainname is a VIP • F5 routes ca and worker requests • Round Robin load balancing across workers • As load gets worse we just add workers Tuesday, April 30, 13
Common modules gained complexity • Interactions got more complex • Old timey non-deterministic problems got bad • We got scared to add functionality Tuesday, April 30, 13
many more discrete environments (different applications, unique modules, SLA’s, etc) • Environments provide separation between each environment AND between staging and production Tuesday, April 30, 13
own manifests and modules • For applications this is a practical split of different file system locations • For staging and production environments these are also different filesystem locations but they reflect different scm branches Tuesday, April 30, 13
version. • In a homogenous environment it’ll feel like a waste. • You won’t need the flexibility until you do. • (We got burned between CentOS 5 and 6) Tuesday, April 30, 13
can logically segment functionality • Set default values for unset variables • Top level manifest directories should be for bootstrapping variables, defaults should make everything else safe. Tuesday, April 30, 13
New developers will be familiar with what you build • Gives you a fighting chance of reusing modules from the forge • You can always be a contributor Tuesday, April 30, 13
templates, plenty of times this was handy, less refactoring later. • If you touch a file with puppet put a header on it. Other ops will appreciate it. Tuesday, April 30, 13
a module, tweak, run, repeat. • Nuked /etc/resolv.conf, couldn’t find the puppet master to fix it. ssh loop :( • Split into staging and prod, now we nuked staging instead • Since we didn’t always develop on the same hardware / OS we couldn’t even just run locally Tuesday, April 30, 13
testing locally with virtual box and vagrant • We have a CentOS 6.2 vagrant box • We have a set of vagrant files that spin up a fresh copy and applies the puppet modules • This prevents silly mistakes and serious ones from stopping everyone Tuesday, April 30, 13
made • People would push puppet syntax errors (way too often) • “That’s a simple change, one second” • No enforced style guides • We wrote puppet doc but never really used it • “The change is in SCM, why don’t we see it in prod?” Tuesday, April 30, 13
is run against all modules • Post-commit puppet-lint is run against all modules • Post-commit puppet-doc is generated • Post-commit changes are pushed to staging automatically Tuesday, April 30, 13
Using :stages to push different branches to different environments • Deploy over scp from scm • A release is pushed to a date stamped directory • The ‘current’ symlink is updated to point to that release • Rollback just points ‘current’ to the previous date stamped directory • Driven through Jenkins Tuesday, April 30, 13
pushed to staging • Dev (branch): gets pushed to dev • Changes to the source tree get pushed post-commit • Latest code is always out there, no skipped deployments, no guessing what’s there by revision history, no more broken dreams. Tuesday, April 30, 13