Puppet at GitHub / ChatOps

56348b545d905e840ef32db4a1c85eed?s=47 Jesse Newland
September 27, 2012

Puppet at GitHub / ChatOps

How we use Puppet at GitHub, and how we Hubot runs our infrastructure for us.

56348b545d905e840ef32db4a1c85eed?s=128

Jesse Newland

September 27, 2012
Tweet

Transcript

  1. None
  2. Jesse Newland jnewland hey errbody my name is jesse newland

    I do ops at GitHub
  3. at Puppet GitHub And today I’m going to be talking

    about Puppet at GitHub. Really, I’m telling a story in two parts.
  4. All of the amazing Puppet OSS projects @rodjek has written

    but doesn’t want to talk about First... I’ll be talking about all of the amazing Puppet open source projects Tim Sharpe has written but doesn’t want to talk about and how we use them at GitHub
  5. * And then, I want to introduce you to the

    star of the GitHub Ops team, Hubot, and tell you a little bit about something we’ve been calling ChatOps
  6. Setup the But, before I get into all of that,

    I'm actually going to talk about an upcoming talk, one by a coworker of mine at GitHub. Will Farrington is going to be speaking tomorrow at 2:45pm about The Setup, our Puppet-powered GitHubber laptop management solution. It's amazing. It's one of the coolest uses of Puppet I've ever seen, and it's going to completely change the way you think about your development environment. But I’m not going to be talking about any of that today. So, yeah, go to Will's talk tommorrow. You won't be disappointed.
  7. at Puppet GitHub So I guess you could say that

    I’m talking about
  8. of Puppet at GitHub THE REST the rest of puppet

    at github. For the scope of this talk, I’m going to be talking about the Puppet infrastructure that runs github.com
  9. 4 years, >100k LOC We’ve been managing GitHub’s infrastructure with

    Puppet for 4 years, since the move to Rackspace. There’s a ton of code, and we’re developing at a rapid pace.
  10. Simple But we are obsessed with keeping our Puppet deployment

    simple
  11. Single Master We use a single puppetmaster running lots of

    unicorns. Nothing fancy. It works for now. However, we will need to scale this tier up or out in about 6 months if the trends look right. We’ll probably switch to two load balanced puppetmasters around that time.
  12. # cat /etc/cron.d/puppet 13 * * * * root /usr/bin/

    cron FTW We don’t run the agent, but rather run puppet on cron every hour in combination with runs triggered via Hubot (more on that later)
  13. No ENC We don’t use an external node classifier

  14. $ cat manifests/nodes/janky.rscloud.pp node /^janky\d+\.rscloud\.github\.com$/ { github::role::janky { 'janky': public_address

    => dns_lookup($fqdn), nginx_hostname => $fqdn, } } ([a-z0-9\-_]+)(\d+)([a-z]?)\.(.*)\.github.com Instead, we give nodes DNS names that adhere to a naming convention that maps them to a pre-defined role
  15. $ head modules/github/manifests/role/janky.pp define github::role::janky($public_address, $nginx_hostname='', $god=true ) { github::core

    { 'janky': } include github::app::janky github::nginx { 'janky': } } Where the magic happens Role definitions are where the magic happens. We try to DRY common functionality into our core module and into other simple classes or defines so that role definitions read like a nice summary of what makes this role different from others
  16. augeas { 'my.cnf/avoid_cardinality_skew': context => '/files/etc/mysql/my.cnf/mysqld/', changes => [ 'set

    innodb_stats_auto_update 0', 'set innodb_stats_on_metadata 0', 'set innodb_stats_on_metadata 64' ], require => Percona::Server[$::fqdn], } Heavy use of augeas We generally try to avoid templates for configuration files in favor of using aw ge us Lets us manage the small pieces of configuration we care about and use the OS defaults for the things we don't.
  17. BORING But I don’t want to just show all of

    you Puppet code for thirty minutes. That's boring
  18. What’s interesting about Puppet at GitHub? I’d rather talk about

    what's interesting about how we use Puppet at GitHub. And what I think is the most interesting is that we focus heavily on ensuring the Puppet development workflow is easily accessible to everyone at GitHub.
  19. Making Puppet Less Scary We’re doing our best to make

    puppet less scary for people that aren’t familiar with it, so they can help the Ops team grow and evolve our infrastructure. We’re doing some things right here, but there’s still a lot of work to do.
  20. I’ve been thinking about this a lot recently as we’ve

    just had two large infrastructure projects shipped by people that were completely or relatively new to puppet. First, Derek Greentree shipped a Cassandra cluster,,,
  21. And Adam Roben shipped puppet manifests for our windows build

    and CI servers.
  22. this is good This is an awesome trend, and I

    want it to continue. So I thought I’d talk a bit today about what we’re doing to try to enable even more of this.
  23. Flow just like a (GitHub) Ruby project For us, an

    important part of making Puppet development accessible for other developers at GitHub is making the development flow on our puppet codebase as similar as possible to that of any other GitHub Ruby project. That means sticking with some common conventions
  24. $ ./script/bootstrap Setup Like making it as easy to setup

    as any other project at GitHub
  25. $ cat Gemfile source :rubygems gem 'puppet', '2.7.18' gem 'facter',

    '1.6.10' gem 'rspec-puppet', '0.1.2' gem 'rake', '0.8.7' gem 'puppet-lint', '0.2.1' gem 'ruby-augeas', '0.3.0' gem 'json', '1.5.1' gem 'fog', '1.3.1' gem 'librarian-puppet', '0.9.4' gem 'parallel_tests' So ruby deps are managed by Bundler
  26. $ cat Puppetfile forge "http://forge.puppetlabs.com" mod 'puppetlabs/apt' ... And puppet

    deps are managed by librarian-puppet, a bundler-like library that manages the puppet modules your infrastructure depends on and install them directly from GitHub repositories. I’m of the opinion that the unit of open source currency is no longer a tarball downloaded from a something named *forge. It’s a GitHub repo. All of the developers at GitHub feel the same way, so Tim wrote librarian puppet
  27. rodjek / librarian-puppet  For those of you keeping score

    at home, that’s the first of Tim Sharpe’s open source projects that I’ve mentioned. Hi Tim!
  28. Making puppet flow like other projects at GitHub means ensuring

    we have good editor support for the language
  29. rodjek / vim-puppet  vim-puppet, that’s two.

  30. $ ./script/cibuild Tests It means running tests is a simple

    one-step process
  31. TESTS! Tests are super important. A solid and easy to

    use test harness helps build developer confidence in a new language.
  32. Safety net And tests are crucial safety net for helping

    people cut their teeth on Puppet if they haven’t ever touched it before.
  33. should contain_github__firewall_rule('internal_network') should contain_ssmtp__relay_to('smtp').with_relay_host('smtp') should contain_file('/etc/logstash/logstash.conf') should include_class('github::ksplice') should contain_networking__bond('bond0').with(

    :gateway => '172.22.0.2', :arp_ip_target => '172.22.0.2', :up_commands => nil ) rspec-puppet We use rspec-puppet heavily. If you haven’t used rspec-puppet yet, go check it out right now. It’s amazing. There are no less than three talks about it at Puppetconf, so I’m not going to talk about HOW to use it today, just touch a little bit on how WE use it.
  34. rodjek / rspec-puppet  rspec-puppet, that’s three

  35. describe 'github::role::fe' do let(:title) { 'fe' } let(:node) { 'fe1.rs.github.com'

    } let(:params) { { :public_address => '207.97.227.242/27', :private_address => '172.22.1.59/22', :git_weight => '16' } } let(:facts) { { :ipaddress => '172.22.1.59', :operatingsystem => 'Debian', :datacenter => 'rackspace-iad2', } } it do should contain_github__core('fe') ... end end role specs are king We try our best to adequately test our individual puppet modules, but our central and most frequently touched specs exercise our role system. There’s one spec for each role which describes its intended functionality. These specs focus on critical functionality of each role, and help a great deal to build confidence that we’re not introducing regressions when adding or refactoring functionality or working in other roles.
  36. $ git commit -am "lolbadchange" modules/github/manifests/role/fe.pp:err: Could not parse for

    environment production: Syntax error at 'allow_outbound_syslog'; expected '}' at /Users/jnewland/github/puppet/modules/github/ manifests/role/fe.pp:31 modules/github/manifests/role/fe.pp - WARNING: => is not properly aligned on line 626 .git/hooks/pre-commit For an even faster feedback loop than running specs, all Puppet dev environments automatically get setup with a pre-commit hook that checks for syntax errors and ensures your changes confirm to the Puppet Style guide. This has proved amazingly useful for Puppet novices and experts alike, novices finding it helps them understand language conventions quickly and guides them towards solutions, and experts using it to catch typos and help them not look like novices.
  37. rodjek / puppet-lint  puppet-lint, that’s four, btw.

  38. specs run on each push auto deploy on CI pass

    rspec-puppet and puppet-lint are automatically run by CI on every commit on every branch pushed to our Puppet repo. Once master passes CI, puppet is automatically deployed
  39. As you can see, Hubot automates a lot of the

    process of rolling out Puppet That example covered pushing changes to master, but what about a Pull-Request based workflow?
  40. Say we have a pull request for a branch we

    want to merge, and that we’ve reviewed the code and it all looks good.
  41. environments branches == On each deploy, we turn all git

    branches into puppet environments.
  42. This combined with heaven, our capistrano-powered deployment API we interact

    with via Hubot, enables us to experiment with unmerged Puppet branches in a powerful way
  43. So, to safely merge this pull request...

  44. hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github You might ask Hubot to confirm its build status
  45. Build #108816 (5fe75932f26ea62cb5fc5e3d0cb302cc2461d11e) of puppet/git-gh13 was successful(421s) github/ puppet@567ea48...5fe7593 Yup,

    looks good.
  46. hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Then roll the branch out to a staging box to make everything applies cleanly there.
  47. ** [out :: REDACTED ] Bootstrapping... ** [out :: REDACTED

    ] Gem environment up-to-date. ** [out :: REDACTED ] Running librarian-puppet... ** [out :: REDACTED ] Generating puppet environments... ** [out :: REDACTED ] Cleaning up deleted branches... ** [out :: REDACTED ] Done! ** [out :: REDACTED ] Sending 'restart' command ** [out :: REDACTED ] The following watches were affected: ** [out :: REDACTED ] puppetmaster_unicorn ** [out :: fs1a.stg.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs1a.stg.github.com] notice: /Stage[main] Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ... Yup, looks good.
  48. hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Then, if you wanted an extra layer of confidence, you could noop the branch against a production node
  49. ** [out :: REDACTED ] Bootstrapping... ** [out :: REDACTED

    ] Gem environment up-to-date. ** [out :: REDACTED ] Running librarian-puppet... ** [out :: REDACTED ] Generating puppet environments... ** [out :: REDACTED ] Cleaning up deleted branches... ** [out :: REDACTED ] Done! ** [out :: REDACTED ] Sending 'restart' command ** [out :: REDACTED ] The following watches were affected: ** [out :: REDACTED ] puppetmaster_unicorn ** [out :: fs1a.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs1a.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: would have changed from '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ... Yup, looks good
  50. hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Next, you’d merge the pull request. If you stopped here, the code would gradually roll out to all affected nodes over the next hour.
  51. hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github If you wanted the rollout to happen faster than that, you could force a puppet run on the affected class of nodes
  52. ** [out :: REDACTED ] Bootstrapping... ** [out :: REDACTED

    ] Gem environment up-to-date. ** [out :: REDACTED ] Running librarian-puppet... ** [out :: REDACTED ] Generating puppet environments... ** [out :: REDACTED ] Cleaning up deleted branches... ** [out :: REDACTED ] Done! ** [out :: REDACTED ] Sending 'restart' command ** [out :: REDACTED ] The following watches were affected: ** [out :: REDACTED ] puppetmaster_unicorn ** [out :: fs1a.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs7b.rs.github.com] info: Applying configuration version '8fb1a2716d5f950b836e511471a2bdac3ed27090' ** [out :: fs1a.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ** [out :: fs7b.rs.github.com] notice: /Stage[main]/ Github::Common_packages/Package[git]/ensure: ensure changed '1:1.7.10-1+github12' to '1:1.7.10-1+github13' ... Yup, that looks good.
  53. hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Then you’d probably want to check out load to make sure nothing went crazy
  54. Yup, looks good

  55. hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github ...and maybe check some logs or other related metrics to confirm your change didn’t break something
  56. Yup, looks good

  57. ChatOps How we interact with Puppet via Hubot is a

    great example of a core principal of how we do ops at GitHub. We’ve been calling it ChatOps recently.
  58. Essentially, ChatOps is the result of Hubot becoming sentient, and

    decreeing, among other things, that we now address him as “Supreme Leader” and communicate with our infrastructure though his secure channels alone. We occasionally observe him speaking in tongues that sound eerily like YouTube comments.
  59.  Hubot Actually, that’s not it at all. Hubot is

    the star of our Ops team.
  60. heaven janky shell graphme Hubot We use hubot day in

    day out to interact with other simple tools we’ve written over JSON apis.
  61. hubot heaven janky shell graphme ALL OF THE APIS Hubot

    interacts nicely with tons of external APIs too. If you have a JSON API, making your service work with Hubot is a piece of cake.
  62. Why is this stupid chat bot so important to Ops?

    But why do we obsess about Hubot so much? It’s just a chat bot, right? There are some distinct upsides to this approach we’ve notices as our use of Hubot in Ops has grown
  63. hubot ci status puppet/git-gh13 deploy:apply puppet/git-gh13 staging/fs1 deploy:noop puppet/git-gh13 prod/fs1

    # merge pull request hubot deploy:apply puppet to prod/fs graph me -1h @collectd.load(fs*) log me hooks github/github Remember the flow I just showed you for rolling out puppet changes to our infrastructure?
  64. Everyone sees all of that happen on their first day

    Everyone sees all of this happen from the minute they join GitHub. It’s right there, in the Ops room, right in the middle of the conversation in campfire.
  65. You don’t just see how to roll out puppet, you

    see how to...
  66. hubot ci status github/smoke-perf check the status of branch’s last

    build
  67. hubot deploy github/smoke-perf to prod/fe1 deploy a any branch of

    any github app to any server
  68. hubot graph me -10min @app-perf get graphs of the app’s

    recent performance
  69. hubot procs unicorn check the status of unicorns across all

    frontends
  70. hubot resque critical check the status of the resque critical

    queue
  71. hubot graph me -10min @collectd.load(fe*) check load on the frontends

  72. hubot conns fe1 check current connections to a frontend that

    you suspect has a problem
  73. hubot log me smoke fe1 grab smoke logs for that

    frontend and realize that you did, in fact, break it
  74. hubot lbctl disable fe1 take it out of the load

    balancer
  75. hubot status yellow Bad deploy. Reverting now. update the status

    blog
  76. hubot who’s on call determine who is currently on call

    so you can apologize to them
  77. hubot pingdom checks check pingdom to make sure you haven’t

    broken everything
  78. hubot upset me chill yourself out really quick

  79. hubot deploy github to prod/fe1 revert back to master on

    the busted frontend
  80. hubot log me smoke fe1 verify things have returned to

    normal
  81. hubot air drum me get pumped up because you fixed

    it
  82. hubot lbctl enable fe1 bring the fixed frontend back into

    the rotation
  83. hubot status green All systems go. clear alerts on the

    status page
  84. hubot whois 4.9.23.22 Once the outage has been resolved, you

    might see how to grab whois information for an IP that exhibited suspicious activity in the logs you saw
  85. hubot khanify spammers and how to hit meme generator to

    make a joke when you realize that IP is a spammer
  86. hubot play in the air tonight then someone would queue

    up the song that popped into their head when they thought about drums and gorillas at the same time
  87. hubot tweet@github PuppetConf Drinkup Friday night at 8:30 at Zeke’s

    (3rd & Brannan) and then finish it all off with a tweet about the Drinkup we’re throwing friday night
  88. ChatOps ChatOps means building tools that make it easier to

    operate your infrastructure via Hubot than via Terminal or Chrome
  89. By placing tools directly in the middle of the conversation

    Because...
  90. Everyone is pairing all of the time This is the

    core concept behind ChatOps.
  91. Teaching by doing Teaching by doing is awesome

  92. This was always my main motivation with hubot - teaching

    by doing by making things visible. It's an extremely powerful teaching technique - @rtomayko Ryan Tomayko had this in mind from the very first commits to hubot, which just presented a simple wrapper around a repository of shell scripts we use for management and monitoring our infrastructure.
  93. This is how I respond to “how to I do

    X” questions in Campfire now. If there’s not yet Hubot functionality to do a thing, we try to write it.
  94. Communicate by doing Placing tools in the middle of the

    conversation also means you get communication of your work for free. If you’re doing something in a shell or on a website, you have to do it, then tell people about it. If you do it with hubot, that comes free.
  95. THINGS I HAVEN’T ASKED RECENTLY For example, here are a

    few things I haven’t asked recently because Hubot has told me the answer
  96. THINGS I HAVEN’T ASKED RECENTLY how’s that deploy going?

  97. THINGS I HAVEN’T ASKED RECENTLY how’s that deploy going? are

    you deploying that or should i?
  98. THINGS I HAVEN’T ASKED RECENTLY how’s that deploy going? are

    you deploying that or should i? is anyone responding to that nagios alert?
  99. THINGS I HAVEN’T ASKED RECENTLY how’s that deploy going? are

    you deploying that or should i? is anyone responding to that nagios alert? is that branch green?
  100. THINGS I HAVEN’T ASKED RECENTLY is that branch green? how’s

    that deploy going? are you deploying that or should i? is anyone responding to that nagios alert? how does load look?
  101. THINGS I HAVEN’T ASKED RECENTLY is that branch green? how’s

    that deploy going? are you deploying that or should i? is anyone responding to that nagios alert? how does load look? did anyone update the status page?
  102. THINGS I HAVEN’T ASKED RECENTLY how’s that deploy going? are

    you deploying that or should i? is anyone responding to that nagios alert? is that branch green? how does load look? did that deploy finish? did anyone update the status page?
  103. Free communication is especially crucial in a distributed environment.

  104. Our Ops team is entirely remote, so Campfire is our

    default means of communication.
  105. http://www.flickr.com/photos/7997249@N06/6061305639/ This is extremely helpful during outages or other situations

    that require tactical response. You don’t have to SAY that you’re spraying water on the fire, people SEE you doing it.
  106. Hide the ugly Another awesome benefit of ChatOps-ing all of

    the things is that you can hide ugly interfaces and design exactly the interaction you want with some simple porcelain commands
  107. My favorite example of this is ugliest of the ugly,

    Nagios.
  108. [nines] hubot opened issue #4263: Nagios (229906) - fs3b/syslog -

    Tue Sept 25 23:40:18 PDT 2012. github/nines#4263 Hubot politely delivers nagios alerts directly into chat
  109. hubot nagios ack fs3b/syslog # fix stuff nagios check fs3b/syslog

    nagios status fs3b/syslog hubot nagios downtime fs3b/syslog 90 nagios mute fs3b/syslog nagios unmute fs3b/syslog Which we can interact with without any unnecessary eye bleeding. Making this easy means developers and other ops engineers actually mute or schedule downtime when they’re testing things.
  110. Mobile FTW Yet another awesome benefit of ChatOps is that

    you get mobile support for free
  111. Well, that is, if you have a team of awesome

    iOS developers that have built an actually functioning Campfire client for the iPhone This lets you do anything hubot can do from your phone. Which means from your couch. Or your bed. Or a beach in Hawaii. Which means you can fix a lot of things without pulling your laptop out of your bag.
  112. ChatOps That’s ChatOps at its finest.

  113. And now for something completely different While I’m showing off

    mobile stuff, I thought I’d slip in a demo of something else we’ve done to make Ops more mobile friendly.
  114. We’ve hacked together support for PagerDuty alerts via Apple Push

    Notifications. When you swipe on the alert, you go directly to the PagerDuty mobile UI for an incident
  115. Which lets you ack an alert

  116. while you’re still in bed

  117. or on the couch.

  118. Boom I can’t even begin to tell you how happy

    this makes me, and how less shitty it makes being on-call
  119. So, who better to summarize all of this than Hubot

    himself. I asked him what he thought about ChatOps. Here’s what he said:
  120. ChatOps all the things. Listen to what Hubot said. You’ll

    love it. Your ops team will love it. And you’ll help other developers learn how to interact with ops tools without any additional work. That’s awesome.
  121. Work at GitHub jesse@github.com If you can’t ChatOps all the

    things at your gig now, you could always just come work with me at GitHub. Shoot me an email if you’re interested.
  122. Thanks! That’s all I have. Thanks for listening! any questions?

  123. Tomorrow @ 8:30 PM Zeke’s 3rd & Brannan While I

    still have everyone’s attention, I wanted to mention the GitHub Drinkup we’re throwing for Puppetconf again. It’s tomorrow night at 8:30pm at Zeke’s, which is on the corner of 3rd and Brannan, everyone’s invited. I’ll see you there. Thanks again!