Slide 1

Slide 1 text

stack smashing railsconf 2012 http://speakerdeck.com/u/czarneckid/

Slide 2

Slide 2 text

david czarnecki

Slide 3

Slide 3 text

twitter @czarneckid

Slide 4

Slide 4 text

github/czarneckid

Slide 5

Slide 5 text

work @agoragames

Slide 6

Slide 6 text

github/agoragames

Slide 7

Slide 7 text

infrastructure insanity

Slide 8

Slide 8 text

CEO priority

Slide 9

Slide 9 text

1 month tour of duty

Slide 10

Slide 10 text

simplify (allthethings)

Slide 11

Slide 11 text

document (allthethings)

Slide 12

Slide 12 text

network overview

Slide 13

Slide 13 text

15 applications intertwined

Slide 14

Slide 14 text

SSO, Profile Service Community Pro Circuit, Live Experience Store Photo Tool, Carousel Tool, League Tool Entitlements, Redemption Starcraft Arena MLG.tv Progamer, Pro Stats

Slide 15

Slide 15 text

app * capacity

Slide 16

Slide 16 text

15 * 2 = 30 VMs

Slide 17

Slide 17 text

VM profile: 1 GB RAM, 40 GB disk

Slide 18

Slide 18 text

MLG traffic #s: 4MM views 1.2MM uniques 35MM page views

Slide 19

Slide 19 text

quickly need a lot of VMs

Slide 20

Slide 20 text

more servers, more problems

Slide 21

Slide 21 text

we <3 hardware: 4 processors 6 cores/processor 64 GB RAM 146 GB disks

Slide 22

Slide 22 text

chef recipes

Slide 23

Slide 23 text

mostly stock

Slide 24

Slide 24 text

application migration

Slide 25

Slide 25 text

start internal

Slide 26

Slide 26 text

end external

Slide 27

Slide 27 text

iteration breeds abstraction

Slide 28

Slide 28 text

application upgrading

Slide 29

Slide 29 text

embrace the pipeline

Slide 30

Slide 30 text

git checkout -b rails32

Slide 31

Slide 31 text

gem “rails”, “3.2.0”

Slide 32

Slide 32 text

asset pipeline is OPTIONAL

Slide 33

Slide 33 text

Gemfile group :assets do gem 'sass-rails', '~> 3.2.3' gem 'coffee-rails', '~> 3.2.1' gem 'compass', '= 0.12.alpha.4' gem 'uglifier', '~> 1.0.3' end gem ‘jquery-rails’

Slide 34

Slide 34 text

application.rb if defined?(Bundler) # If you precompile assets before deploying to production, use this line Bundler.require(*Rails.groups(:assets => %w(development test))) # If you want your assets lazily compiled in production, use this line # Bundler.require(:default, :assets, Rails.env) end ... # Enable the asset pipeline config.assets.enabled = true

Slide 35

Slide 35 text

production.rb # Compress assets and add digests config.assets.compress = true config.assets.js_compressor = Uglifier.new(:copyright => false) if defined?(Uglifier) config.assets.digest = true # Precompile additional assets (application.js, application.css, and all non-JS/CSS are already added) config.assets.precompile += %w( home.css home.js admin.css admin.js custom-application.js ) config.assets.precompile += [/plugins\/jquery\.ui\.selectmenu\.(css|js) $/, /plugins\/jquery\.gwfselect\.(css|js)$/]

Slide 36

Slide 36 text

terminal $ mkdir app/assets $ mv public/images/ app/assets/ $ mv public/javascripts/ app/assets/ $ mv public/stylesheets/ app/assets/

Slide 37

Slide 37 text

the unicorn

Slide 38

Slide 38 text

wicked fast

Slide 39

Slide 39 text

kernel load-balancing

Slide 40

Slide 40 text

can do rolling restarts

Slide 41

Slide 41 text

signal for capacity

Slide 42

Slide 42 text

sv-rails-run.erb #!/bin/bash exec 2>&1 <% unicorn_command = @options[:unicorn_command] || 'unicorn_rails' -%> test -f /var/rails/.rvm/scripts/rvm || exit 1 exec /usr/bin/sudo -u rails -i < || exit 1 exec bundle exec <%= unicorn_command %> -c config/unicorn.rb - E <%= @options[:environment] %> END

Slide 43

Slide 43 text

unicorn.rb rails_env = ENV['RAILS_ENV'] || 'production' worker_processes (rails_env == 'production' ? 4 : 1) preload_app true # Restart any workers that haven't responded in 30 seconds timeout 30 # Listen on a Unix data socket case rails_env when 'production' || 'staging' listen "/var/rails/application/tmp/sockets/ #{rails_env}.sock", :backlog => 2048 else listen "#{`pwd`.strip}/tmp/sockets/#{rails_env}.sock" end

Slide 44

Slide 44 text

service configuration

Slide 45

Slide 45 text

if a server fails ...

Slide 46

Slide 46 text

does it make a sound?

Slide 47

Slide 47 text

no, you get a phone call

Slide 48

Slide 48 text

the usual suspect

Slide 49

Slide 49 text

database.yml production: adapter: mysql2 host: machine-name reconnect: true pool: 5 database: appname_production username: secret password: sup3rs3cr3t encoding: utf8

Slide 50

Slide 50 text

spot a problem?

Slide 51

Slide 51 text

database.yml production: adapter: mysql2 host: machine-name reconnect: true pool: 5 database: appname_production username: secret password: sup3rs3cr3t encoding: utf8

Slide 52

Slide 52 text

databases never fail!?

Slide 53

Slide 53 text

how about an alias?

Slide 54

Slide 54 text

/etc/bind/db.int ... mysql.yourcompany IN CNAME machine-name mysql-slave. yourcompany IN CNAME another-machine-name ...

Slide 55

Slide 55 text

database.yml production: adapter: mysql2 host: mysql.yourcompany.int reconnect: true pool: 5 database: appname_production username: secret password: sup3rs3cr3t encoding: utf8

Slide 56

Slide 56 text

no re-deploys for failure!

Slide 57

Slide 57 text

simply update DNS

Slide 58

Slide 58 text

do this for redis

Slide 59

Slide 59 text

and maybe memcached

Slide 60

Slide 60 text

or for any services

Slide 61

Slide 61 text

offline processing

Slide 62

Slide 62 text

who doesn’t use resque?

Slide 63

Slide 63 text

rails recipe resque-aware

Slide 64

Slide 64 text

Individual application ... 'application' => { :root => '/var/rails/application/current', :environment => 'production', :queues => {'application_queue' => 4, 'application_mailer' => 1, 'application_checkin_expiration' => 1}, :queue_intervals => {}, :resque_log => '/var/rails/application/shared/ log/resque.log' }, ...

Slide 65

Slide 65 text

node[:ruby][:sites].each_pair do |site, opts| runit_service site do owner 'rails' group 'rails' template_name 'rails' options opts end if opts[:resque_scheduler] == `hostname`.strip runit_service "resque-scheduler-#{site}" do owner 'rails' group 'rails' template_name 'resque-scheduler' options opts end end if opts[:queues] opts[:queues].each do |queue, workers| options_for_template = opts.dup options_for_template[:queue] = queue 1.upto(workers) do |index| runit_service "resque-#{site}-#{queue}-worker-#{index}" do owner 'rails' group 'rails' template_name 'resque' options options_for_template end end end end end

Slide 66

Slide 66 text

sv-resque-run.erb #!/bin/bash exec 2>&1 test -f /var/rails/.rvm/scripts/rvm || exit 1 exec /usr/bin/sudo -u rails -i < || exit 1 RAILS_ENV=<%= @options[:environment] %> QUEUES=<%= @options[:queue] %> INTERVAL=<%= @options[:queue_intervals] [@options[:queue]] || '5' %> exec bundle exec rake environment resque:work >><%= @options[:resque_log] %> 2>&1 END

Slide 67

Slide 67 text

sexy capistrano

Slide 68

Slide 68 text

sexy == DRY

Slide 69

Slide 69 text

deploy.rb

Slide 70

Slide 70 text

deploy.rb set :application, 'some-mlg-app' set :application_server, 'unicorn' require 'capistrano/agora/base' load 'deploy' if respond_to?(:namespace) require 'capistrano/agora/airbrake' require 'capistrano/agora/assets' require 'capistrano/agora/rvm' require 'capistrano/agora/hipchat' set :hipchat_room_name, 'Some MLG Application' require 'capistrano/agora/logging' require 'capistrano/agora/resque' require 'capistrano/agora/symlinks' require 'capistrano/agora/sv' require 'capistrano/agora/unicorn' set :resque_queues, { 'some-mlg-app.retrieve_stuff' => 4, 'some-mlg-app.update_stuff' => 1, 'some-mlg-app.email_stuff' => 1 } set :asset_directory, 'public/players' before 'deploy:restart', 'deploy:assets:precompile_with_skip'

Slide 71

Slide 71 text

gem capistrano-agora

Slide 72

Slide 72 text

common functionality

Slide 73

Slide 73 text

Gemfile ... group :deploy do gem 'capistrano' gem 'capistrano-ext' gem 'capistrano-agora' gem 'hipchat' end ...

Slide 74

Slide 74 text

sensible for > 1 app

Slide 75

Slide 75 text

capistrano-agora/ airbrake assets base helpers hipchat logging resque rvm sv symlinks unicorn version

Slide 76

Slide 76 text

base.rb Capistrano::Configuration.instance.load do default_run_options[:pty] = true ssh_options[:forward_agent] = true set :scm, :git set :deploy_via, :remote_cache set :keep_releases, 7 set :use_sudo, false set :branch, fetch(:branch, "master") unless exists?(:branch) set :gateway, "#{fetch(:user, `whoami`.strip)}@your.dmz.com" unless exists?(:gateway) set :repository, "[email protected]:agoragames/#{application}.git" unless exists?(:repository) set :deploy_to, "/var/rails/#{application}" unless exists?(:deploy_to) set :shared_nfs_dir, "/var/shared/rails/#{application}" end

Slide 77

Slide 77 text

assets.rb Capistrano::Configuration.instance.load do set :asset_directory, 'public/assets' set :assets_dependencies, %w(app/assets vendor/assets Gemfile.lock config/routes.rb) namespace :deploy do namespace :assets do task :precompile_with_skip, :roles => :web, :except => { :no_release => true } do from = source.next_revision(current_revision) if capture("cd #{latest_release} && #{source.local.log(previous_revision, current_revision)} #{assets_dependencies.join(' ')} | wc -l").to_i > 0 run "cd #{fetch(:current_path)} && bundle exec rake assets:precompile RAILS_ENV=#{rails_env}" else logger.info "Skipping asset pre-compilation because there were no asset changes. Copying assets from #{previous_release}." run "cp -R #{previous_release}/#{asset_directory} #{latest_release}/ #{asset_directory}" end end end end end

Slide 78

Slide 78 text

resque.rb Capistrano::Configuration.instance.load do namespace :resque do desc <<-DESC Restart the Resque workers for an application after a deploy or deploy:migrations DESC task :restart_workers, :roles => :app, :except => { :no_release => true } do if exists?(:resque_queues) fetch(:resque_queues).each do |queue_name, worker_count| 1.upto(worker_count) do |worker_index| run "sv restart resque-#{fetch(:application)}-#{queue_name}-worker- #{worker_index}" end end else logger.info('You must define the :resque_queues variable for the resque:restart_workers task to work') end end end after "deploy", "resque:restart_workers" after "deploy:migrations", "resque:restart_workers" end

Slide 79

Slide 79 text

unicorn.rb require 'capistrano/agora/helpers' Capistrano::Configuration.instance.load do namespace :unicorn do desc 'Increase number of unicorn workers' task :increase_workers, :roles => :app do num_workers = fetch(:num_workers, 1) unicorn_hosts = fetch(:unicorn_hosts, ['host1', 'host2']) unicorn_hosts.each do |host| worker_process_id = capture("cat /etc/sv/#{fetch(:application)}/supervise/pid", :hosts => host).chomp 1.upto(num_workers.to_i) do run("kill -TTIN #{worker_process_id}", :hosts => host) end end end desc 'Decrease number of unicorn workers' task :decrease_workers, :roles => :app do num_workers = fetch(:num_workers, 1) unicorn_hosts = fetch(:unicorn_hosts, ['host1', 'host2']) unicorn_hosts.each do |host| worker_process_id = capture("cat /etc/sv/#{fetch(:application)}/supervise/pid", :hosts => host).chomp 1.upto(num_workers.to_i) do run("kill -TTOU #{worker_process_id}", :hosts => host) end end end end end

Slide 80

Slide 80 text

application monitoring

Slide 81

Slide 81 text

it must be visual

Slide 82

Slide 82 text

it must be historical

Slide 83

Slide 83 text

it must be accessible

Slide 84

Slide 84 text

we are using Munin

Slide 85

Slide 85 text

spot problems

Slide 86

Slide 86 text

spot opportunity

Slide 87

Slide 87 text

No content

Slide 88

Slide 88 text

infrastructure monitoring

Slide 89

Slide 89 text

system engineering project

Slide 90

Slide 90 text

they chose cucumber

Slide 91

Slide 91 text

rackspace-validations

Slide 92

Slide 92 text

runs every 5 minutes

Slide 93

Slide 93 text

step definitions

Slide 94

Slide 94 text

step_definitions/ command_steps.rb dns_steps.rb file_steps.rb ping_steps.rb

Slide 95

Slide 95 text

command_steps.rb def retry_times(times) begin yield rescue case times -= 1 when 0 raise else retry end end end When /^I go to "(https?:\/\/[^"]*)"$/ do |uri| begin @host = uri.host retry_times(3) { http(uri) } rescue Errno::ECONNREFUSED @output = 'connection refused' @status = 127 rescue Timeout::Error @output = 'execution expired' @status = 127 end end

Slide 96

Slide 96 text

command_steps.rb Then /^the (?:output|response) should (?:include|contain) "([^"]*)"$/ do |string| string = eval("\"#{string}\"") assert @output.include?(string), "expected to find \"#{string}\" in the output from #{@host}, but did not" end Then /^the (?:output|response) should not (?:include|contain) "([^"]*)"$/ do |string| string = eval("\"#{string}\"") assert [email protected]?(string), "expected not to find \"#{string}\" in the output from #{@host}, but did" end Then /^the (?:output|response) should (?:include|contain) "([^"]*X[^"]*)", where X is less than (\d+)$/ do |string, value| string = eval("\"#{string}\"") regex = Regexp.new(string.sub('X', '(\d+)')) assert @output =~ regex x = $1.to_i assert x < value, "expected \"#{x}\" to be less than \"#{value}\" in the output from #{@host}" end

Slide 97

Slide 97 text

dns_steps.rb When /^I do a DNS lookup for ([\w\-\.]+)$/ do |name| @host = name begin @alias = Socket.gethostbyname(name).first rescue SocketError @alias = nil end end Then /^it should point (?:at|to) ([\w\-\.]+)$/ do |name| assert_equal name, @alias, "expected #{@host} to CNAME to #{@alias}, but it didn't" end

Slide 98

Slide 98 text

file_steps.rb Before do @stats = [] end When /^I stat (\S+)$/ do |glob| Dir[glob].each do |path| @stats << File.stat(path) end end Then /^there should be at least (\d+) files?$/ do |count| assert @stats.length >= count end Then /^the most recently modified file should be less than (\d+) (\w+)s? old$/ do |count, unit| assert @stats.collect { |stat| stat.mtime }.max > time_ago(count, unit) end

Slide 99

Slide 99 text

ping_steps.rb When /^I ping ([\w\-\.]+)$/ do |host| @host = host body = nil IO.popen(['/bin/ping', '-c', '1', '-n', host]) { |io| body = io.read } if $?.to_i == 0 body =~ /^rtt min\/avg\/max\/mdev = (\d\.\d{3}+)\/(\d\.\d{3}+)\/(\d\.\d{3}+)\/(\d\.\d{3}+) ms$/ @status = true @response = $2.to_f else @status = false @response = 0.0 end end Then /^I receive a response$/ do assert @status, "did not receive a response from #{@host}" end Then /^I receive a response within (\d+)(?: ?ms| milliseconds)$/ do |ms| assert @status, "did not receive a response from #{@host}" assert @response < ms, "response time from #{@host} of #{@response} was slower than #{ms} milliseconds" end

Slide 100

Slide 100 text

.feature

Slide 101

Slide 101 text

features/ applications.feature haproxy.feature memcache.feature mongodb.feature mysql.feature nginx.feature redis.feature varnish.feature

Slide 102

Slide 102 text

applications.feature

Slide 103

Slide 103 text

Feature: Applications @critical Scenario: Ensure the company site is available When I go to "http://www.yourcompany.com/" Then the response should either be a 200 or a 302 to "http://www.yourcompany.com/whateva"

Slide 104

Slide 104 text

mongodb.feature

Slide 105

Slide 105 text

Feature: MongoDB In order to support our environment And avoid lost data resulting from system failure or incompetence As a responsible system administrator I want to ensure that our MongoDB databases are running as expected And that we have good slaves of the masters (well, secondaries to our primaries...) And that our snapshotted backups are up to date @critical Scenario Outline: Ensure MongoD is running When I SSH to and run `pgrep mongod` Then it should exit successfully Examples: | host | | mongo-primary.int | | mongo-secondary.int | | mongo-secondary.int |

Slide 106

Slide 106 text

redis.feature

Slide 107

Slide 107 text

Feature: Redis @critical Scenario Outline: Ensure a Redis host is up and running the expected version When I open a socket to : and send "INFO\r\n" Then the output should include "redis_version:" Examples: | host | port | version | | redis.int | 6379 | 2.4.2 | | redis-staging.int | 6379 | 2.4.2 |

Slide 108

Slide 108 text

continuous integration

Slide 109

Slide 109 text

we use Jenkins

Slide 110

Slide 110 text

builds all internal projects

Slide 111

Slide 111 text

builds all internal gems

Slide 112

Slide 112 text

internal Gem In A Box-secure

Slide 113

Slide 113 text

Gemfile source 'http://rubygems.org' source 'https://username:[email protected]' ...

Slide 114

Slide 114 text

:git => into your organization

Slide 115

Slide 115 text

shields you from infosuicide

Slide 116

Slide 116 text

build failure?

Slide 117

Slide 117 text

notification via e-mail

Slide 118

Slide 118 text

notification via HipChat

Slide 119

Slide 119 text

notify (allthethings)

Slide 120

Slide 120 text

continuous deployment

Slide 121

Slide 121 text

GitHub flow

Slide 122

Slide 122 text

master is deployable

Slide 123

Slide 123 text

features in branches

Slide 124

Slide 124 text

post-build script trigger

Slide 125

Slide 125 text

PostBuildScript #!/bin/bash if [ "$GIT_BRANCH" == "origin/HEAD" ] || [ "$GIT_BRANCH" == 'master' ]; then curl "http://continuous-integration.com/job/project-deploy/build" else echo "$GIT_BRANCH, not deploying" fi

Slide 126

Slide 126 text

execute only if build succeeds: check

Slide 127

Slide 127 text

Jenkins project-deploy

Slide 128

Slide 128 text

Build #!/bin/bash source "$HOME/.rvm/scripts/rvm" [[ -s ".rvmrc" ]] && source .rvmrc bundle install && bundle exec cap production deploy:migrations -S user=deploy -S branch=$GIT_BRANCH

Slide 129

Slide 129 text

recap (allthethings)

Slide 130

Slide 130 text

simplify

Slide 131

Slide 131 text

upgrade (small to big)

Slide 132

Slide 132 text

alias

Slide 133

Slide 133 text

DRY

Slide 134

Slide 134 text

monitor

Slide 135

Slide 135 text

validate

Slide 136

Slide 136 text

integrate

Slide 137

Slide 137 text

stack smashing http://speakerdeck.com/u/czarneckid/ david czarnecki twitter @czarneckid github/czarneckid github/agoragames