Puppet in Production

Puppet in Prod 10 things we learned the hard way
Matthew Finlayson [email protected] Tuesday, April 30, 13

Matthew Finlayson • Staff Engineer - Jive Software • @savagegus
on twitter • matthew.finlayson@jivesoftw are.com • Feel free to find me during the conference or ping me online after. Tuesday, April 30, 13

jivesoftware.com/try-jive Tuesday, April 30, 13

Context • This project started 3 Years Ago • Homemade
VMware, NetApp, and Cisco cloud • Automating Infrastructure & Application • Managing ~10,000 production clients • All CentOS 5 / 6 based • All VMware for the Jive application • All Bare Metal for infrastructure • We’re a Java Shop ® Tuesday, April 30, 13

Disclaimer • This is an embarrassing talk • Each item
starts with how we tried doing it • Ending with how we’re doing it now • This means I have to tell you all the dumb things we did • Be kind Tuesday, April 30, 13

ANTI-PATTERNS Tuesday, April 30, 13

What not to do and how not to do it
Tuesday, April 30, 13

Lessons learned the hard way Tuesday, April 30, 13

1 what Tuesday, April 30, 13

• We did everything at once • Tarred up Java
Apps • Used puppet to run: ./configure; make; make install; • ~40 modules, many interdependent • Lots of hardcoding • Used puppet to run scripts to collect input to use in other modules • The list of sins goes on. Tuesday, April 30, 13

• Services, packages, and OS configurations are the low hanging
fruit • No Package == No Puppet • Build in layers, start with uniform piece parts • Think twice about dynamic NFS shares (mount is weird.) • If you’re using NIS it’s time to start cutting yourself. Tuesday, April 30, 13

2 integrate Tuesday, April 30, 13

• Had a preexisting system of record • Java application
that acted as a dynamic configuration and management database • This system is responsible for provisioning, configuration, and management of our application • Started with custom facts and generated nodes.pp files • Tried regex in conjunction with hostnames Tuesday, April 30, 13

• Finally turned to the External Node Classifier • puppet
-> bash -> curl -> Java! • Used ENC to specify classes with basic hierarchies • Passed configuration variables down to modules • Finally graduated to Hiera and ENC Tuesday, April 30, 13

3 scaling Tuesday, April 30, 13

• Started with a puppet and webrick • Moved to
puppet and passenger • At 500 clients we had a (serious) problem • Tried scaling up (added ram and cpu) • Tried tuning apache • Reduced puppet run frequency • Tried prayer and heavy drinking Tuesday, April 30, 13

• On the client side there were problems too •
CentOS 5 was shipping ruby 1.8.5 • Constant hung puppet processes were spiking VM’s and causing changes to be delayed • We moved to Enterprise Ruby 1.8.7 of of /usr/local Tuesday, April 30, 13

• 2 puppet masters on bare metal • 1 puppet
certificate authority on bare metal • puppet.domainname is a VIP • F5 routes ca and worker requests • Round Robin load balancing across workers • As load gets worse we just add workers Tuesday, April 30, 13

4 environments Tuesday, April 30, 13

• As our application environments changed we added modules •
Common modules gained complexity • Interactions got more complex • Old timey non-deterministic problems got bad • We got scared to add functionality Tuesday, April 30, 13

• We’re running on 2 puppet masters • We have
many more discrete environments (different applications, unique modules, SLA’s, etc) • Environments provide separation between each environment AND between staging and production Tuesday, April 30, 13

• This means each puppet environment we define specifies its
own manifests and modules • For applications this is a practical split of different file system locations • For staging and production environments these are also different filesystem locations but they reflect different scm branches Tuesday, April 30, 13

5 extensibility Tuesday, April 30, 13

• Always write modules that switch on OS type and
version. • In a homogenous environment it’ll feel like a waste. • You won’t need the flexibility until you do. • (We got burned between CentOS 5 and 6) Tuesday, April 30, 13

• Use classes • Parameters are your friend • You
can logically segment functionality • Set default values for unset variables • Top level manifest directories should be for bootstrapping variables, defaults should make everything else safe. Tuesday, April 30, 13

• Use puppet-module • It’s good to have conventions •
New developers will be familiar with what you build • Gives you a fighting chance of reusing modules from the forge • You can always be a contributor Tuesday, April 30, 13

• If you need to use file, try to use
templates, plenty of times this was handy, less refactoring later. • If you touch a file with puppet put a header on it. Other ops will appreciate it. Tuesday, April 30, 13

6 testing Tuesday, April 30, 13

• We started with the lather rinse repeat method. Write
a module, tweak, run, repeat. • Nuked /etc/resolv.conf, couldn’t find the puppet master to fix it. ssh loop :( • Split into staging and prod, now we nuked staging instead • Since we didn’t always develop on the same hardware / OS we couldn’t even just run locally Tuesday, April 30, 13

• Split out environments to staging and production • Started
testing locally with virtual box and vagrant • We have a CentOS 6.2 vagrant box • We have a set of vagrant files that spin up a fresh copy and applies the puppet modules • This prevents silly mistakes and serious ones from stopping everyone Tuesday, April 30, 13

7 post-commit Tuesday, April 30, 13

• Changes in SCM with no idea why they were
made • People would push puppet syntax errors (way too often) • “That’s a simple change, one second” • No enforced style guides • We wrote puppet doc but never really used it • “The change is in SCM, why don’t we see it in prod?” Tuesday, April 30, 13

• Pre-commit hook for JIRA issue • Post-commit puppet syntax
is run against all modules • Post-commit puppet-lint is run against all modules • Post-commit puppet-doc is generated • Post-commit changes are pushed to staging automatically Tuesday, April 30, 13

8 deployment Tuesday, April 30, 13

• Started with manually replicating changes from staging to production
• Moved to mavenized builds of tarballs (i know, i know) • Production is on a different network than SCM • Tried rsync Tuesday, April 30, 13

• Finally moved to capistrano with multi-environment support (capistrano-ext) •
Using :stages to push different branches to different environments • Deploy over scp from scm • A release is pushed to a date stamped directory • The ‘current’ symlink is updated to point to that release • Rollback just points ‘current’ to the previous date stamped directory • Driven through Jenkins Tuesday, April 30, 13

9 promotion Tuesday, April 30, 13

• Started with a production environment • Changes and new
code went there • (i think we know where this is going) • FINALLY we had trunk going from dev to staging to production Tuesday, April 30, 13

• Trunk: gets pushed to production • Staging (branch): gets
pushed to staging • Dev (branch): gets pushed to dev • Changes to the source tree get pushed post-commit • Latest code is always out there, no skipped deployments, no guessing what’s there by revision history, no more broken dreams. Tuesday, April 30, 13

10 help Tuesday, April 30, 13

• Puppet Users - Google Groups • Pro Puppet by
Turnbull & McCune • http://www.planetpuppet.org • http://www.dzone.com/mz/devops • #puppet on irc.freenode.net • http://dev2ops.org Tuesday, April 30, 13

seriously, when you don’t know how or if you can
do it github. github Tuesday, April 30, 13

n+1 questions Matthew Finlayson [email protected] Tuesday, April 30, 13

Puppet in Production

Puppet in Production

More Decks by Matthew Finlayson

Other Decks in Programming

Featured

Transcript