Puppet at GitHub / ChatOps

56348b545d905e840ef32db4a1c85eed?s=47 Jesse Newland
September 27, 2012

Puppet at GitHub / ChatOps

How we use Puppet at GitHub, and how we Hubot runs our infrastructure for us.

56348b545d905e840ef32db4a1c85eed?s=128

Jesse Newland

September 27, 2012
Tweet

Transcript

  1. 1.
  2. 3.

    at Puppet GitHub And today I’m going to be talking

    about Puppet at GitHub. Really, I’m telling a story in two parts.
  3. 4.

    All of the amazing Puppet OSS projects @rodjek has written

    but doesn’t want to talk about First... I’ll be talking about all of the amazing Puppet open source projects Tim Sharpe has written but doesn’t want to talk about and how we use them at GitHub
  4. 5.

    * And then, I want to introduce you to the

    star of the GitHub Ops team, Hubot, and tell you a little bit about something we’ve been calling ChatOps
  5. 6.

    Setup the But, before I get into all of that,

    I'm actually going to talk about an upcoming talk, one by a coworker of mine at GitHub. Will Farrington is going to be speaking tomorrow at 2:45pm about The Setup, our Puppet-powered GitHubber laptop management solution. It's amazing. It's one of the coolest uses of Puppet I've ever seen, and it's going to completely change the way you think about your development environment. But I’m not going to be talking about any of that today. So, yeah, go to Will's talk tommorrow. You won't be disappointed.
  6. 8.

    of Puppet at GitHub THE REST the rest of puppet

    at github. For the scope of this talk, I’m going to be talking about the Puppet infrastructure that runs github.com
  7. 9.

    4 years, >100k LOC We’ve been managing GitHub’s infrastructure with

    Puppet for 4 years, since the move to Rackspace. There’s a ton of code, and we’re developing at a rapid pace.
  8. 11.

    Single Master We use a single puppetmaster running lots of

    unicorns. Nothing fancy. It works for now. However, we will need to scale this tier up or out in about 6 months if the trends look right. We’ll probably switch to two load balanced puppetmasters around that time.
  9. 12.

    # cat /etc/cron.d/puppet 13 * * * * root /usr/bin/

    cron FTW We don’t run the agent, but rather run puppet on cron every hour in combination with runs triggered via Hubot (more on that later)
  10. 14.

    $ cat manifests/nodes/janky.rscloud.pp node /^janky\d+\.rscloud\.github\.com$/ { github::role::janky { 'janky': public_address

    => dns_lookup($fqdn), nginx_hostname => $fqdn, } } ([a-z0-9\-_]+)(\d+)([a-z]?)\.(.*)\.github.com Instead, we give nodes DNS names that adhere to a naming convention that maps them to a pre-defined role
  11. 15.

    $ head modules/github/manifests/role/janky.pp define github::role::janky($public_address, $nginx_hostname='', $god=true ) { github::core

    { 'janky': } include github::app::janky github::nginx { 'janky': } } Where the magic happens Role definitions are where the magic happens. We try to DRY common functionality into our core module and into other simple classes or defines so that role definitions read like a nice summary of what makes this role different from others
  12. 16.

    augeas { 'my.cnf/avoid_cardinality_skew': context => '/files/etc/mysql/my.cnf/mysqld/', changes => [ 'set

    innodb_stats_auto_update 0', 'set innodb_stats_on_metadata 0', 'set innodb_stats_on_metadata 64' ], require => Percona::Server[$::fqdn], } Heavy use of augeas We generally try to avoid templates for configuration files in favor of using aw ge us Lets us manage the small pieces of configuration we care about and use the OS defaults for the things we don't.
  13. 17.

    BORING But I don’t want to just show all of

    you Puppet code for thirty minutes. That's boring
  14. 18.

    What’s interesting about Puppet at GitHub? I’d rather talk about

    what's interesting about how we use Puppet at GitHub. And what I think is the most interesting is that we focus heavily on ensuring the Puppet development workflow is easily accessible to everyone at GitHub.
  15. 19.

    Making Puppet Less Scary We’re doing our best to make

    puppet less scary for people that aren’t familiar with it, so they can help the Ops team grow and evolve our infrastructure. We’re doing some things right here, but there’s still a lot of work to do.
  16. 20.

    I’ve been thinking about this a lot recently as we’ve

    just had two large infrastructure projects shipped by people that were completely or relatively new to puppet. First, Derek Greentree shipped a Cassandra cluster,,,
  17. 22.

    this is good This is an awesome trend, and I

    want it to continue. So I thought I’d talk a bit today about what we’re doing to try to enable even more of this.
  18. 23.

    Flow just like a (GitHub) Ruby project For us, an

    important part of making Puppet development accessible for other developers at GitHub is making the development flow on our puppet codebase as similar as possible to that of any other GitHub Ruby project. That means sticking with some common conventions
  19. 25.

    $ cat Gemfile source :rubygems gem 'puppet', '2.7.18' gem 'facter',

    '1.6.10' gem 'rspec-puppet', '0.1.2' gem 'rake', '0.8.7' gem 'puppet-lint', '0.2.1' gem 'ruby-augeas', '0.3.0' gem 'json', '1.5.1' gem 'fog', '1.3.1' gem 'librarian-puppet', '0.9.4' gem 'parallel_tests' So ruby deps are managed by Bundler
  20. 26.

    $ cat Puppetfile forge "http://forge.puppetlabs.com" mod 'puppetlabs/apt' ... And puppet

    deps are managed by librarian-puppet, a bundler-like library that manages the puppet modules your infrastructure depends on and install them directly from GitHub repositories. I’m of the opinion that the unit of open source currency is no longer a tarball downloaded from a something named *forge. It’s a GitHub repo. All of the developers at GitHub feel the same way, so Tim wrote librarian puppet
  21. 27.

    rodjek / librarian-puppet  For those of you keeping score

    at home, that’s the first of Tim Sharpe’s open source projects that I’ve mentioned. Hi Tim!
  22. 28.

    Making puppet flow like other projects at GitHub means ensuring

    we have good editor support for the language
  23. 31.

    TESTS! Tests are super important. A solid and easy to

    use test harness helps build developer confidence in a new language.
  24. 32.

    Safety net And tests are crucial safety net for helping

    people cut their teeth on Puppet if they haven’t ever touched it before.
  25. 33.

    should contain_github__firewall_rule('internal_network') should contain_ssmtp__relay_to('smtp').with_relay_host('smtp') should contain_file('/etc/logstash/logstash.conf') should include_class('github::ksplice') should contain_networking__bond('bond0').with(

    :gateway => '172.22.0.2', :arp_ip_target => '172.22.0.2', :up_commands => nil ) rspec-puppet We use rspec-puppet heavily. If you haven’t used rspec-puppet yet, go check it out right now. It’s amazing. There are no less than three talks about it at Puppetconf, so I’m not going to talk about HOW to use it today, just touch a little bit on how WE use it.
  26. 35.

    describe 'github::role::fe' do let(:title) { 'fe' } let(:node) { 'fe1.rs.github.com'

    } let(:params) { { :public_address => '207.97.227.242/27', :private_address => '172.22.1.59/22', :git_weight => '16' } } let(:facts) { { :ipaddress => '172.22.1.59', :operatingsystem => 'Debian', :datacenter => 'rackspace-iad2', } } it do should contain_github__core('fe') ... end end role specs are king We try our best to adequately test our individual puppet modules, but our central and most frequently touched specs exercise our role system. There’s one spec for each role which describes its intended functionality. These specs focus on critical functionality of each role, and help a great deal to build confidence that we’re not introducing regressions when adding or refactoring functionality or working in other roles.
  27. 36.

    $ git commit -am "lolbadchange" modules/github/manifests/role/fe.pp:err: Could not parse for

    environment production: Syntax error at 'allow_outbound_syslog'; expected '}' at /Users/jnewland/github/puppet/modules/github/ manifests/role/fe.pp:31 modules/github/manifests/role/fe.pp - WARNING: => is not properly aligned on line 626 .git/hooks/pre-commit For an even faster feedback loop than running specs, all Puppet dev environments automatically get setup with a pre-commit hook that checks for syntax errors and ensures your changes confirm to the Puppet Style guide. This has proved amazingly useful for Puppet novices and experts alike, novices finding it helps them understand language conventions quickly and guides them towards solutions, and experts using it to catch typos and help them not look like novices.
  28. 38.

    specs run on each push auto deploy on CI pass

    rspec-puppet and puppet-lint are automatically run by CI on every commit on every branch pushed to our Puppet repo. Once master passes CI, puppet is automatically deployed
  29. 39.

    As you can see, Hubot automates a lot of the

    process of rolling out Puppet That example covered pushing changes to master, but what about a Pull-Request based workflow?
  30. 40.

    Say we have a pull request for a branch we

    want to merge, and that we’ve reviewed the code and it all looks good.
  31. 41.
  32. 42.

    This combined with heaven, our capistrano-powered deployment API we interact

    with via Hubot, enables us to experiment with unmerged Puppet branches in a powerful way
  33. 44.

    hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github You might ask Hubot to confirm its build status
  34. 46.

    hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Then roll the branch out to a staging box to make everything applies cleanly there.
  35. 47.

    ** [out :: REDACTED ] Bootstrapping... ** [out :: REDACTED

    ] Gem environment up-to-date. ** [out :: REDACTED ] Running librarian-puppet... ** [out :: REDACTED ] Generating puppet environments... ** [out :: REDACTED ] Cleaning up deleted branches... ** [out :: REDACTED ] Done! ** [out :: REDACTED ] Sending 'restart' command ** [out :: REDACTED ] The following watches were affected: ** [out :: REDACTED ] puppetmaster_unicorn ** [out :: fs1a.stg.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs1a.stg.github.com] notice: /Stage[main] Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ... Yup, looks good.
  36. 48.

    hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Then, if you wanted an extra layer of confidence, you could noop the branch against a production node
  37. 49.

    ** [out :: REDACTED ] Bootstrapping... ** [out :: REDACTED

    ] Gem environment up-to-date. ** [out :: REDACTED ] Running librarian-puppet... ** [out :: REDACTED ] Generating puppet environments... ** [out :: REDACTED ] Cleaning up deleted branches... ** [out :: REDACTED ] Done! ** [out :: REDACTED ] Sending 'restart' command ** [out :: REDACTED ] The following watches were affected: ** [out :: REDACTED ] puppetmaster_unicorn ** [out :: fs1a.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs1a.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: would have changed from '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ... Yup, looks good
  38. 50.

    hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Next, you’d merge the pull request. If you stopped here, the code would gradually roll out to all affected nodes over the next hour.
  39. 51.

    hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github If you wanted the rollout to happen faster than that, you could force a puppet run on the affected class of nodes
  40. 52.

    ** [out :: REDACTED ] Bootstrapping... ** [out :: REDACTED

    ] Gem environment up-to-date. ** [out :: REDACTED ] Running librarian-puppet... ** [out :: REDACTED ] Generating puppet environments... ** [out :: REDACTED ] Cleaning up deleted branches... ** [out :: REDACTED ] Done! ** [out :: REDACTED ] Sending 'restart' command ** [out :: REDACTED ] The following watches were affected: ** [out :: REDACTED ] puppetmaster_unicorn ** [out :: fs1a.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs7b.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs1a.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ** [out :: fs7b.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ... Yup, that looks good.
  41. 53.

    hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Then you’d probably want to check out load to make sure nothing went crazy
  42. 55.

    hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github ...and maybe check some logs or other related metrics to confirm your change didn’t break something
  43. 57.

    ChatOps How we interact with Puppet via Hubot is a

    great example of a core principal of how we do ops at GitHub. We’ve been calling it ChatOps recently.
  44. 58.

    Essentially, ChatOps is the result of Hubot becoming sentient, and

    decreeing, among other things, that we now address him as “Supreme Leader” and communicate with our infrastructure though his secure channels alone. We occasionally observe him speaking in tongues that sound eerily like YouTube comments.
  45. 60.

    heaven janky shell graphme Hubot We use hubot day in

    day out to interact with other simple tools we’ve written over JSON apis.
  46. 61.

    hubot heaven janky shell graphme ALL OF THE APIS Hubot

    interacts nicely with tons of external APIs too. If you have a JSON API, making your service work with Hubot is a piece of cake.
  47. 62.

    Why is this stupid chat bot so important to Ops?

    But why do we obsess about Hubot so much? It’s just a chat bot, right? There are some distinct upsides to this approach we’ve notices as our use of Hubot in Ops has grown
  48. 63.

    hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Remember the flow I just showed you for rolling out puppet changes to our infrastructure?
  49. 64.

    Everyone sees all of that happen on their first day

    Everyone sees all of this happen from the minute they join GitHub. It’s right there, in the Ops room, right in the middle of the conversation in campfire.
  50. 73.

    hubot log me smoke fe1 grab smoke logs for that

    frontend and realize that you did, in fact, break it
  51. 84.

    hubot whois 4.9.23.22 Once the outage has been resolved, you

    might see how to grab whois information for an IP that exhibited suspicious activity in the logs you saw
  52. 85.

    hubot khanify spammers and how to hit meme generator to

    make a joke when you realize that IP is a spammer
  53. 86.

    hubot play in the air tonight then someone would queue

    up the song that popped into their head when they thought about drums and gorillas at the same time
  54. 87.

    hubot tweet@github PuppetConf Drinkup Friday night at 8:30 at Zeke’s

    (3rd & Brannan) and then finish it all off with a tweet about the Drinkup we’re throwing friday night
  55. 88.

    ChatOps ChatOps means building tools that make it easier to

    operate your infrastructure via Hubot than via Terminal or Chrome
  56. 90.
  57. 92.

    This was always my main motivation with hubot - teaching

    by doing by making things visible. It's an extremely powerful teaching technique - @rtomayko Ryan Tomayko had this in mind from the very first commits to hubot, which just presented a simple wrapper around a repository of shell scripts we use for management and monitoring our infrastructure.
  58. 93.

    This is how I respond to “how to I do

    X” questions in Campfire now. If there’s not yet Hubot functionality to do a thing, we try to write it.
  59. 94.

    Communicate by doing Placing tools in the middle of the

    conversation also means you get communication of your work for free. If you’re doing something in a shell or on a website, you have to do it, then tell people about it. If you do it with hubot, that comes free.
  60. 95.

    THINGS I HAVEN’T ASKED RECENTLY For example, here are a

    few things I haven’t asked recently because Hubot has told me the answer
  61. 98.

    THINGS I HAVEN’T ASKED RECENTLY how’s that deploy going? are

    you deploying that or should i? is anyone responding to that nagios alert?
  62. 99.

    THINGS I HAVEN’T ASKED RECENTLY how’s that deploy going? are

    you deploying that or should i? is anyone responding to that nagios alert? is that branch green?
  63. 100.

    THINGS I HAVEN’T ASKED RECENTLY is that branch green? how’s

    that deploy going? are you deploying that or should i? is anyone responding to that nagios alert? how does load look?
  64. 101.

    THINGS I HAVEN’T ASKED RECENTLY is that branch green? how’s

    that deploy going? are you deploying that or should i? is anyone responding to that nagios alert? how does load look? did anyone update the status page?
  65. 102.

    THINGS I HAVEN’T ASKED RECENTLY how’s that deploy going? are

    you deploying that or should i? is anyone responding to that nagios alert? is that branch green? how does load look? did that deploy finish? did anyone update the status page?
  66. 104.
  67. 105.

    http://www.flickr.com/photos/7997249@N06/6061305639/ This is extremely helpful during outages or other situations

    that require tactical response. You don’t have to SAY that you’re spraying water on the fire, people SEE you doing it.
  68. 106.

    Hide the ugly Another awesome benefit of ChatOps-ing all of

    the things is that you can hide ugly interfaces and design exactly the interaction you want with some simple porcelain commands
  69. 108.

    [nines] hubot opened issue #4263: Nagios (229906) - fs3b/syslog -

    Tue Sept 25 23:40:18 PDT 2012. github/nines#4263 Hubot politely delivers nagios alerts directly into chat
  70. 109.

    hubot nagios ack fs3b/syslog # fix stuff nagios check fs3b/syslog

    nagios status fs3b/syslog hubot nagios downtime fs3b/syslog 90 nagios mute fs3b/syslog nagios unmute fs3b/syslog Which we can interact with without any unnecessary eye bleeding. Making this easy means developers and other ops engineers actually mute or schedule downtime when they’re testing things.
  71. 110.
  72. 111.

    Well, that is, if you have a team of awesome

    iOS developers that have built an actually functioning Campfire client for the iPhone This lets you do anything hubot can do from your phone. Which means from your couch. Or your bed. Or a beach in Hawaii. Which means you can fix a lot of things without pulling your laptop out of your bag.
  73. 113.

    And now for something completely different While I’m showing off

    mobile stuff, I thought I’d slip in a demo of something else we’ve done to make Ops more mobile friendly.
  74. 114.

    We’ve hacked together support for PagerDuty alerts via Apple Push

    Notifications. When you swipe on the alert, you go directly to the PagerDuty mobile UI for an incident
  75. 118.

    Boom I can’t even begin to tell you how happy

    this makes me, and how less shitty it makes being on-call
  76. 119.

    So, who better to summarize all of this than Hubot

    himself. I asked him what he thought about ChatOps. Here’s what he said:
  77. 120.

    ChatOps all the things. Listen to what Hubot said. You’ll

    love it. Your ops team will love it. And you’ll help other developers learn how to interact with ops tools without any additional work. That’s awesome.
  78. 121.

    Work at GitHub jesse@github.com If you can’t ChatOps all the

    things at your gig now, you could always just come work with me at GitHub. Shoot me an email if you’re interested.
  79. 123.

    Tomorrow @ 8:30 PM Zeke’s 3rd & Brannan While I

    still have everyone’s attention, I wanted to mention the GitHub Drinkup we’re throwing for Puppetconf again. It’s tomorrow night at 8:30pm at Zeke’s, which is on the corner of 3rd and Brannan, everyone’s invited. I’ll see you there. Thanks again!