Puppet at GitHub / ChatOps

Jesse Newland jnewland hey errbody my name is jesse newland
I do ops at GitHub

at Puppet GitHub And today I’m going to be talking
about Puppet at GitHub. Really, I’m telling a story in two parts.

All of the amazing Puppet OSS projects @rodjek has written
but doesn’t want to talk about First... I’ll be talking about all of the amazing Puppet open source projects Tim Sharpe has written but doesn’t want to talk about and how we use them at GitHub

* And then, I want to introduce you to the
star of the GitHub Ops team, Hubot, and tell you a little bit about something we’ve been calling ChatOps

Setup the But, before I get into all of that,
I'm actually going to talk about an upcoming talk, one by a coworker of mine at GitHub. Will Farrington is going to be speaking tomorrow at 2:45pm about The Setup, our Puppet-powered GitHubber laptop management solution. It's amazing. It's one of the coolest uses of Puppet I've ever seen, and it's going to completely change the way you think about your development environment. But I’m not going to be talking about any of that today. So, yeah, go to Will's talk tommorrow. You won't be disappointed.

at Puppet GitHub So I guess you could say that
I’m talking about

of Puppet at GitHub THE REST the rest of puppet
at github. For the scope of this talk, I’m going to be talking about the Puppet infrastructure that runs github.com

4 years, >100k LOC We’ve been managing GitHub’s infrastructure with
Puppet for 4 years, since the move to Rackspace. There’s a ton of code, and we’re developing at a rapid pace.

Simple But we are obsessed with keeping our Puppet deployment
simple

Single Master We use a single puppetmaster running lots of
unicorns. Nothing fancy. It works for now. However, we will need to scale this tier up or out in about 6 months if the trends look right. We’ll probably switch to two load balanced puppetmasters around that time.

# cat /etc/cron.d/puppet 13 * * * * root /usr/bin/
cron FTW We don’t run the agent, but rather run puppet on cron every hour in combination with runs triggered via Hubot (more on that later)

No ENC We don’t use an external node classiﬁer

$ cat manifests/nodes/janky.rscloud.pp node /^janky\d+\.rscloud\.github\.com$/ { github::role::janky { 'janky': public_address
=> dns_lookup($fqdn), nginx_hostname => $fqdn, } } ([a-z0-9\-_]+)(\d+)([a-z]?)\.(.*)\.github.com Instead, we give nodes DNS names that adhere to a naming convention that maps them to a pre-deﬁned role

$ head modules/github/manifests/role/janky.pp define github::role::janky($public_address, $nginx_hostname='', $god=true ) { github::core
{ 'janky': } include github::app::janky github::nginx { 'janky': } } Where the magic happens Role definitions are where the magic happens. We try to DRY common functionality into our core module and into other simple classes or defines so that role definitions read like a nice summary of what makes this role different from others

augeas { 'my.cnf/avoid_cardinality_skew': context => '/files/etc/mysql/my.cnf/mysqld/', changes => [ 'set
innodb_stats_auto_update 0', 'set innodb_stats_on_metadata 0', 'set innodb_stats_on_metadata 64' ], require => Percona::Server[$::fqdn], } Heavy use of augeas We generally try to avoid templates for configuration files in favor of using aw ge us Lets us manage the small pieces of configuration we care about and use the OS defaults for the things we don't.

BORING But I don’t want to just show all of
you Puppet code for thirty minutes. That's boring

What’s interesting about Puppet at GitHub? I’d rather talk about
what's interesting about how we use Puppet at GitHub. And what I think is the most interesting is that we focus heavily on ensuring the Puppet development workﬂow is easily accessible to everyone at GitHub.

Making Puppet Less Scary We’re doing our best to make
puppet less scary for people that aren’t familiar with it, so they can help the Ops team grow and evolve our infrastructure. We’re doing some things right here, but there’s still a lot of work to do.

I’ve been thinking about this a lot recently as we’ve
just had two large infrastructure projects shipped by people that were completely or relatively new to puppet. First, Derek Greentree shipped a Cassandra cluster,,,

And Adam Roben shipped puppet manifests for our windows build
and CI servers.

this is good This is an awesome trend, and I
want it to continue. So I thought I’d talk a bit today about what we’re doing to try to enable even more of this.

Flow just like a (GitHub) Ruby project For us, an
important part of making Puppet development accessible for other developers at GitHub is making the development ﬂow on our puppet codebase as similar as possible to that of any other GitHub Ruby project. That means sticking with some common conventions

$ ./script/bootstrap Setup Like making it as easy to setup
as any other project at GitHub

$ cat Gemfile source :rubygems gem 'puppet', '2.7.18' gem 'facter',
'1.6.10' gem 'rspec-puppet', '0.1.2' gem 'rake', '0.8.7' gem 'puppet-lint', '0.2.1' gem 'ruby-augeas', '0.3.0' gem 'json', '1.5.1' gem 'fog', '1.3.1' gem 'librarian-puppet', '0.9.4' gem 'parallel_tests' So ruby deps are managed by Bundler

$ cat Puppetfile forge "http://forge.puppetlabs.com" mod 'puppetlabs/apt' ... And puppet
deps are managed by librarian-puppet, a bundler-like library that manages the puppet modules your infrastructure depends on and install them directly from GitHub repositories. I’m of the opinion that the unit of open source currency is no longer a tarball downloaded from a something named *forge. It’s a GitHub repo. All of the developers at GitHub feel the same way, so Tim wrote librarian puppet

rodjek / librarian-puppet  For those of you keeping score
at home, that’s the ﬁrst of Tim Sharpe’s open source projects that I’ve mentioned. Hi Tim!

Making puppet ﬂow like other projects at GitHub means ensuring
we have good editor support for the language

rodjek / vim-puppet  vim-puppet, that’s two.

$ ./script/cibuild Tests It means running tests is a simple
one-step process

TESTS! Tests are super important. A solid and easy to
use test harness helps build developer conﬁdence in a new language.

Safety net And tests are crucial safety net for helping
people cut their teeth on Puppet if they haven’t ever touched it before.

should contain_github__firewall_rule('internal_network') should contain_ssmtp__relay_to('smtp').with_relay_host('smtp') should contain_file('/etc/logstash/logstash.conf') should include_class('github::ksplice') should contain_networking__bond('bond0').with(
:gateway => '172.22.0.2', :arp_ip_target => '172.22.0.2', :up_commands => nil ) rspec-puppet We use rspec-puppet heavily. If you haven’t used rspec-puppet yet, go check it out right now. It’s amazing. There are no less than three talks about it at Puppetconf, so I’m not going to talk about HOW to use it today, just touch a little bit on how WE use it.

rodjek / rspec-puppet  rspec-puppet, that’s three

describe 'github::role::fe' do let(:title) { 'fe' } let(:node) { 'fe1.rs.github.com'
} let(:params) { { :public_address => '207.97.227.242/27', :private_address => '172.22.1.59/22', :git_weight => '16' } } let(:facts) { { :ipaddress => '172.22.1.59', :operatingsystem => 'Debian', :datacenter => 'rackspace-iad2', } } it do should contain_github__core('fe') ... end end role specs are king We try our best to adequately test our individual puppet modules, but our central and most frequently touched specs exercise our role system. There’s one spec for each role which describes its intended functionality. These specs focus on critical functionality of each role, and help a great deal to build conﬁdence that we’re not introducing regressions when adding or refactoring functionality or working in other roles.

$ git commit -am "lolbadchange" modules/github/manifests/role/fe.pp:err: Could not parse for
environment production: Syntax error at 'allow_outbound_syslog'; expected '}' at /Users/jnewland/github/puppet/modules/github/ manifests/role/fe.pp:31 modules/github/manifests/role/fe.pp - WARNING: => is not properly aligned on line 626 .git/hooks/pre-commit For an even faster feedback loop than running specs, all Puppet dev environments automatically get setup with a pre-commit hook that checks for syntax errors and ensures your changes conﬁrm to the Puppet Style guide. This has proved amazingly useful for Puppet novices and experts alike, novices ﬁnding it helps them understand language conventions quickly and guides them towards solutions, and experts using it to catch typos and help them not look like novices.

rodjek / puppet-lint  puppet-lint, that’s four, btw.

specs run on each push auto deploy on CI pass
rspec-puppet and puppet-lint are automatically run by CI on every commit on every branch pushed to our Puppet repo. Once master passes CI, puppet is automatically deployed

As you can see, Hubot automates a lot of the
process of rolling out Puppet That example covered pushing changes to master, but what about a Pull-Request based workﬂow?

Say we have a pull request for a branch we
want to merge, and that we’ve reviewed the code and it all looks good.

environments branches == On each deploy, we turn all git
branches into puppet environments.

This combined with heaven, our capistrano-powered deployment API we interact
with via Hubot, enables us to experiment with unmerged Puppet branches in a powerful way

So, to safely merge this pull request...

hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1
# merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github You might ask Hubot to conﬁrm its build status

Build #108816 (5fe75932f26ea62cb5fc5e3d0cb302cc2461d11e) of puppet/git-gh13 was successful(421s) github/ [email protected] Yup,
looks good.

# merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Then roll the branch out to a staging box to make everything applies cleanly there.

** [out :: REDACTED ] Bootstrapping... ** [out :: REDACTED
] Gem environment up-to-date. ** [out :: REDACTED ] Running librarian-puppet... ** [out :: REDACTED ] Generating puppet environments... ** [out :: REDACTED ] Cleaning up deleted branches... ** [out :: REDACTED ] Done! ** [out :: REDACTED ] Sending 'restart' command ** [out :: REDACTED ] The following watches were affected: ** [out :: REDACTED ] puppetmaster_unicorn ** [out :: fs1a.stg.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs1a.stg.github.com] notice: /Stage[main] Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ... Yup, looks good.

# merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Then, if you wanted an extra layer of conﬁdence, you could noop the branch against a production node

] Gem environment up-to-date. ** [out :: REDACTED ] Running librarian-puppet... ** [out :: REDACTED ] Generating puppet environments... ** [out :: REDACTED ] Cleaning up deleted branches... ** [out :: REDACTED ] Done! ** [out :: REDACTED ] Sending 'restart' command ** [out :: REDACTED ] The following watches were affected: ** [out :: REDACTED ] puppetmaster_unicorn ** [out :: fs1a.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs1a.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: would have changed from '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ... Yup, looks good

# merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Next, you’d merge the pull request. If you stopped here, the code would gradually roll out to all affected nodes over the next hour.

# merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github If you wanted the rollout to happen faster than that, you could force a puppet run on the affected class of nodes

] Gem environment up-to-date. ** [out :: REDACTED ] Running librarian-puppet... ** [out :: REDACTED ] Generating puppet environments... ** [out :: REDACTED ] Cleaning up deleted branches... ** [out :: REDACTED ] Done! ** [out :: REDACTED ] Sending 'restart' command ** [out :: REDACTED ] The following watches were affected: ** [out :: REDACTED ] puppetmaster_unicorn ** [out :: fs1a.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs7b.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs1a.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ** [out :: fs7b.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ... Yup, that looks good.

# merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Then you’d probably want to check out load to make sure nothing went crazy

Yup, looks good

# merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github ...and maybe check some logs or other related metrics to conﬁrm your change didn’t break something

Yup, looks good

ChatOps How we interact with Puppet via Hubot is a
great example of a core principal of how we do ops at GitHub. We’ve been calling it ChatOps recently.

Essentially, ChatOps is the result of Hubot becoming sentient, and
decreeing, among other things, that we now address him as “Supreme Leader” and communicate with our infrastructure though his secure channels alone. We occasionally observe him speaking in tongues that sound eerily like YouTube comments.

 Hubot Actually, that’s not it at all. Hubot is
the star of our Ops team.

heaven janky shell graphme Hubot We use hubot day in
day out to interact with other simple tools we’ve written over JSON apis.

hubot heaven janky shell graphme ALL OF THE APIS Hubot
interacts nicely with tons of external APIs too. If you have a JSON API, making your service work with Hubot is a piece of cake.

Why is this stupid chat bot so important to Ops?
But why do we obsess about Hubot so much? It’s just a chat bot, right? There are some distinct upsides to this approach we’ve notices as our use of Hubot in Ops has grown

# merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Remember the ﬂow I just showed you for rolling out puppet changes to our infrastructure?

Everyone sees all of that happen on their first day
Everyone sees all of this happen from the minute they join GitHub. It’s right there, in the Ops room, right in the middle of the conversation in campﬁre.

You don’t just see how to roll out puppet, you
see how to...

hubot ci status github/smoke-perf check the status of branch’s last
build

hubot deploy github/smoke-perf to prod/fe1 deploy a any branch of
any github app to any server

hubot graph me -10min @app-perf get graphs of the app’s
recent performance

hubot procs unicorn check the status of unicorns across all
frontends

hubot resque critical check the status of the resque critical
queue

hubot graph me -10min @collectd.load(fe*) check load on the frontends

hubot conns fe1 check current connections to a frontend that
you suspect has a problem

hubot log me smoke fe1 grab smoke logs for that
frontend and realize that you did, in fact, break it

hubot lbctl disable fe1 take it out of the load
balancer

hubot status yellow Bad deploy. Reverting now. update the status
blog

hubot who’s on call determine who is currently on call
so you can apologize to them

hubot pingdom checks check pingdom to make sure you haven’t
broken everything

hubot upset me chill yourself out really quick

hubot deploy github to prod/fe1 revert back to master on
the busted frontend

hubot log me smoke fe1 verify things have returned to
normal

hubot air drum me get pumped up because you ﬁxed
it

hubot lbctl enable fe1 bring the ﬁxed frontend back into
the rotation

hubot status green All systems go. clear alerts on the
status page

hubot whois 4.9.23.22 Once the outage has been resolved, you
might see how to grab whois information for an IP that exhibited suspicious activity in the logs you saw

hubot khanify spammers and how to hit meme generator to
make a joke when you realize that IP is a spammer

hubot play in the air tonight then someone would queue
up the song that popped into their head when they thought about drums and gorillas at the same time

hubot tweet@github PuppetConf Drinkup Friday night at 8:30 at Zeke’s
(3rd & Brannan) and then ﬁnish it all off with a tweet about the Drinkup we’re throwing friday night

ChatOps ChatOps means building tools that make it easier to
operate your infrastructure via Hubot than via Terminal or Chrome

By placing tools directly in the middle of the conversation
Because...

Everyone is pairing all of the time This is the
core concept behind ChatOps.

Teaching by doing Teaching by doing is awesome

This was always my main motivation with hubot - teaching
by doing by making things visible. It's an extremely powerful teaching technique - @rtomayko Ryan Tomayko had this in mind from the very ﬁrst commits to hubot, which just presented a simple wrapper around a repository of shell scripts we use for management and monitoring our infrastructure.

This is how I respond to “how to I do
X” questions in Campﬁre now. If there’s not yet Hubot functionality to do a thing, we try to write it.

Communicate by doing Placing tools in the middle of the
conversation also means you get communication of your work for free. If you’re doing something in a shell or on a website, you have to do it, then tell people about it. If you do it with hubot, that comes free.

THINGS I HAVEN’T ASKED RECENTLY For example, here are a
few things I haven’t asked recently because Hubot has told me the answer

THINGS I HAVEN’T ASKED RECENTLY how’s that deploy going?

THINGS I HAVEN’T ASKED RECENTLY how’s that deploy going? are
you deploying that or should i?

you deploying that or should i? is anyone responding to that nagios alert?

you deploying that or should i? is anyone responding to that nagios alert? is that branch green?

THINGS I HAVEN’T ASKED RECENTLY is that branch green? how’s
that deploy going? are you deploying that or should i? is anyone responding to that nagios alert? how does load look?

THINGS I HAVEN’T ASKED RECENTLY is that branch green? how’s
that deploy going? are you deploying that or should i? is anyone responding to that nagios alert? how does load look? did anyone update the status page?

you deploying that or should i? is anyone responding to that nagios alert? is that branch green? how does load look? did that deploy finish? did anyone update the status page?

Free communication is especially crucial in a distributed environment.

Our Ops team is entirely remote, so Campﬁre is our
default means of communication.

http://www.flickr.com/photos/7997249@N06/6061305639/ This is extremely helpful during outages or other situations
that require tactical response. You don’t have to SAY that you’re spraying water on the ﬁre, people SEE you doing it.

Hide the ugly Another awesome beneﬁt of ChatOps-ing all of
the things is that you can hide ugly interfaces and design exactly the interaction you want with some simple porcelain commands

My favorite example of this is ugliest of the ugly,
Nagios.

[nines] hubot opened issue #4263: Nagios (229906) - fs3b/syslog -
Tue Sept 25 23:40:18 PDT 2012. github/nines#4263 Hubot politely delivers nagios alerts directly into chat

hubot nagios ack fs3b/syslog # fix stuff nagios check fs3b/syslog
nagios status fs3b/syslog hubot nagios downtime fs3b/syslog 90 nagios mute fs3b/syslog nagios unmute fs3b/syslog Which we can interact with without any unnecessary eye bleeding. Making this easy means developers and other ops engineers actually mute or schedule downtime when they’re testing things.

Mobile FTW Yet another awesome beneﬁt of ChatOps is that
you get mobile support for free

Well, that is, if you have a team of awesome
iOS developers that have built an actually functioning Campﬁre client for the iPhone This lets you do anything hubot can do from your phone. Which means from your couch. Or your bed. Or a beach in Hawaii. Which means you can ﬁx a lot of things without pulling your laptop out of your bag.

ChatOps That’s ChatOps at its ﬁnest.

And now for something completely different While I’m showing off
mobile stuff, I thought I’d slip in a demo of something else we’ve done to make Ops more mobile friendly.

We’ve hacked together support for PagerDuty alerts via Apple Push
Notiﬁcations. When you swipe on the alert, you go directly to the PagerDuty mobile UI for an incident

Which lets you ack an alert

while you’re still in bed

or on the couch.

Boom I can’t even begin to tell you how happy
this makes me, and how less shitty it makes being on-call

So, who better to summarize all of this than Hubot
himself. I asked him what he thought about ChatOps. Here’s what he said:

ChatOps all the things. Listen to what Hubot said. You’ll
love it. Your ops team will love it. And you’ll help other developers learn how to interact with ops tools without any additional work. That’s awesome.

Work at GitHub [email protected] If you can’t ChatOps all the
things at your gig now, you could always just come work with me at GitHub. Shoot me an email if you’re interested.

Thanks! That’s all I have. Thanks for listening! any questions?

Tomorrow @ 8:30 PM Zeke’s 3rd & Brannan While I
still have everyone’s attention, I wanted to mention the GitHub Drinkup we’re throwing for Puppetconf again. It’s tomorrow night at 8:30pm at Zeke’s, which is on the corner of 3rd and Brannan, everyone’s invited. I’ll see you there. Thanks again!

Puppet at GitHub / ChatOps

Puppet at GitHub / ChatOps

More Decks by Jesse Newland

Other Decks in Technology

Featured

Transcript