Particular Particulars about "Zero Downtime" Deploys

Particular Particulars about "Zero Downtime" Deploys

Burlington Ruby Conference
August 2-3, 2013

D1a58c46532900ba65fd439e64527ef4?s=128

David Czarnecki

August 03, 2013
Tweet

Transcript

  1. particular particulars about “zero downtime” deploys @btvrubyconf David Czarnecki Friday,

    August 2, 13
  2. @czarneckid Friday, August 2, 13

  3. @agoragames Friday, August 2, 13

  4. things that are 24x7 Friday, August 2, 13

  5. things that are 24x7 diners Friday, August 2, 13

  6. things that are 24x7 video games Friday, August 2, 13

  7. things that are 24x7 web applications Friday, August 2, 13

  8. is there a 24 x 7 server? Friday, August 2,

    13
  9. unicorn is ... Friday, August 2, 13

  10. unicorn is ... a fast, rack HTTP server Friday, August

    2, 13
  11. unicorn is ... a process manager Friday, August 2, 13

  12. unicorn is ... a “thread safe” server Friday, August 2,

    13
  13. unicorn is ... a “zero downtime” server Friday, August 2,

    13
  14. github unicorn blog post https://github.com/blog/517-unicorn Friday, August 2, 13

  15. why try “zero downtime”? Friday, August 2, 13

  16. why try “zero downtime”? running artisanal VMs Friday, August 2,

    13
  17. why try “zero downtime”? because ,|,, ,,|,* to cap deploy:web:disable

    and cap deploy:web:enable Friday, August 2, 13
  18. why try “zero downtime”? because ,|,, ,,|,* to cap deploy:web:disable

    and cap deploy:web:enable *also magnets Friday, August 2, 13
  19. why try “zero downtime”? scale up/down instantly Friday, August 2,

    13
  20. why try “zero downtime”? continuous deployment Friday, August 2, 13

  21. why try “zero downtime”? web applications are 24/7 Friday, August

    2, 13
  22. our “zero downtime” story since january 30th, 2013 no downtime

    # commits ruby 1.9.3 updates ruby 2.0.0 updates Friday, August 2, 13
  23. genesis of this presentation https://gist.github.com/czarneckid/4639793 Friday, August 2, 13

  24. zero downtime deploys with ++unicorn + nginx + +runit +

    rvm + chef * Friday, August 2, 13
  25. zero downtime deploys with ++unicorn + nginx + +runit +

    rvm + chef * *also capistrano Friday, August 2, 13
  26. let’s get into particulars Friday, August 2, 13

  27. unicorn.rb # Location: RAILS_ROOT/config/unicorn.rb rails_env = ENV['RAILS_ENV'] || 'development' worker_processes

    (rails_env == 'production' ? 6 : 1) preload_app true check_client_connection true timeout 30 Friday, August 2, 13
  28. unicorn.rb case rails_env when 'production', 'staging' # It is *very*

    important that you choose a location for the unicorn PID file that is # outside of the RAILS_ROOT directory. We use capistrano for deployment, where we # deploy via remote_cache. We noticed that when we had the unicorn PID file defined # in a directory under RAILS_ROOT (the default PID location is RAILS_ROOT/tmp/pids/unicorn.pid), # that the script was not able to reclaim the old unicorn PID file after the symlink # for current/ gets moved to the latest deploy by capistrano. pid '/var/www/rails-application/shared/pids/unicorn.pid' listen "/var/www/rails-application/tmp/sockets/ #{rails_env}.sock", :backlog => 2048 else listen 3001 listen "#{`pwd`.strip}/tmp/sockets/#{rails_env}.sock" end Friday, August 2, 13
  29. unicorn.rb # via http://unicorn.bogomips.org/Sandbox.html # See section on BUNDLER_GEMFILE for

    Capistrano users # We need this since we automatically run deploy:clean to # cleanup old releases. before_exec do |server| ENV["BUNDLE_GEMFILE"] = "/var/www/rails-application/current/ Gemfile" end Friday, August 2, 13
  30. unicorn.rb before_fork do |server, worker| # When sent a USR2,

    Unicorn will suffix its pidfile with .oldbin and # immediately start loading up a new version of itself (loaded with a new # version of our app). When this new Unicorn is completely loaded # it will begin spawning workers. The first worker spawned will check to # see if an .oldbin pidfile exists. If so, this means we've just booted up # a new Unicorn and need to tell the old one that it can now die. To do so # we send it a QUIT. # # Using this method we get 0 downtime deploys. old_pid = '/var/www/rails-application/shared/pids/unicorn.pid.oldbin' if File.exists?(old_pid) && server.pid != old_pid begin Process.kill("QUIT", File.read(old_pid).to_i) rescue Errno::ENOENT, Errno::ESRCH # someone else did our job for us end end end Friday, August 2, 13
  31. unicorn.rb after_fork do |server, worker| # Unicorn master loads the

    app then forks off workers - because of the way # Unix forking works, we need to make sure we aren't using any of the parent's # sockets, e.g. db connection # defined?(ActiveRecord::Base) and ActiveRecord::Base.establish_connection # Redis and Memcached would go here but their connections are established # on demand, so the master never opens a socket # $redis = Redis.connect end Friday, August 2, 13
  32. deploy.rb # Location: RAILS_ROOT/config/deploy.rb # We noticed that the default

    asset precompilation happens after the current/ symlink is created. We # changed asset precompilation to happen before the current/ symlink is moved so that we don't have a period # where stylesheets, etc. for the running unicorn process are invalid. before 'deploy:create_symlink', 'deploy:assets:precompile' after 'deploy', 'deploy:cleanup' namespace :deploy do desc <<-DESC Send a USR2 to the unicorn process to restart for zero downtime deploys. runit expects 2 to tell it to send the USR2 signal to the process. DESC task :restart, :roles => :app, :except => { :no_release => true } do run "sv 2 #{application}" end end Friday, August 2, 13
  33. rails-application.conf # Location: cookbooks/nginx/files/default/rails-application.conf # We run unicorn in production

    listening on a UNIX socket behind NGINX. upstream unicorn { server unix:/var/www/rails-application/tmp/sockets/ production.sock fail_timeout=0; } server { ... # set up the rails servers as a virtual location for use later location @rails { ... proxy_pass http://unicorn; ... } ... } Friday, August 2, 13
  34. sv-rails-application-run.erb # Location: cookbooks/unicorn/templates/default/sv-rails- application-run.erb # Original author: @brentkirby -

    https://gist.github.com/1039720 #!/bin/bash exec 2>&1 <% unicorn_command = @options[:unicorn_command] || 'unicorn_rails' -%> # # Since unicorn creates a new pid on restart/reload, it needs a little extra love to # manage with runit. Instead of managing unicorn directly, we simply trap signal calls # to the service and redirect them to unicorn directly. # # To make this work properly with RVM, you should create a wrapper for the app's gemset unicorn. # Friday, August 2, 13
  35. sv-rails-application-run.erb function is_unicorn_alive { set +e if [ -n $1

    ] && kill -0 $1 >/dev/null 2>&1; then echo "yes" fi set -e } echo "Service PID: $$" CUR_PID_FILE=/var/www/rails-application/shared/pids/unicorn.pid OLD_PID_FILE=$CUR_PID_FILE.oldbin if [ -e $OLD_PID_FILE ]; then OLD_PID=$(cat $OLD_PID_FILE) echo "Waiting for existing master ($OLD_PID) to exit" while [ -n "$(is_unicorn_alive $OLD_PID)" ]; do /bin/echo -n '.' sleep 2 done fi Friday, August 2, 13
  36. sv-rails-application-run.erb if [ -e $CUR_PID_FILE ]; then CUR_PID=$(cat $CUR_PID_FILE) if

    [ -n "$(is_unicorn_alive $CUR_PID)" ]; then echo "Unicorn master already running. PID: $CUR_PID" RUNNING=true fi fi if [ ! $RUNNING ]; then echo "Starting unicorn" export rvm_user_install_flag=1 export rvm_trust_rvmrcs=1 export rvm_trust_rvmrcs_flag=1 source /var/lib/rails-application/.rvm/scripts/rvm cd /var/www/rails-application/current # You need to daemonize the unicorn process, http:// unicorn.bogomips.org/unicorn_rails_1.html bundle exec <%= unicorn_command %> -c config/unicorn.rb -E <%= @options[:environment] || 'production' %> -D sleep 3 CUR_PID=$(cat $CUR_PID_FILE) fi Friday, August 2, 13
  37. sv-rails-application-run.erb function restart { echo "Initialize new master with USR2"

    kill -USR2 $CUR_PID # Make runit restart to pick up new unicorn pid sleep 2 echo "Restarting service to capture new pid" exit } function graceful_shutdown { echo "Initializing graceful shutdown" kill -QUIT $CUR_PID } function unicorn_interrupted { echo "Unicorn process interrupted. Possibly a runit thing?" } Friday, August 2, 13
  38. sv-rails-application-run.erb trap restart HUP QUIT USR2 INT trap graceful_shutdown TERM

    KILL trap unicorn_interrupted ALRM echo "Waiting for current master to die. PID: ($CUR_PID)" while [ -n "$(is_unicorn_alive $CUR_PID)" ]; do /bin/echo -n '.' sleep 2 done echo "You've killed a unicorn!" Friday, August 2, 13
  39. recap the salient points Friday, August 2, 13

  40. unicorn PID file outside RAILS_ROOT Friday, August 2, 13

  41. unicorn before_exec for deploy:clean Friday, August 2, 13

  42. capistrano send a usr2 signal Friday, August 2, 13

  43. nginx listen on a UNIX socket Friday, August 2, 13

  44. runit daemonize unicorn with -D Friday, August 2, 13

  45. other nifty unicorn tricks Friday, August 2, 13

  46. TTIN increment workers by 1 Friday, August 2, 13

  47. TTOU decrement workers by 1 Friday, August 2, 13

  48. rack-statsd gem # Location: RAILS_ROOT/config/initializers/rack_statsd.rb # Rack stats, courtesy of

    technoweenie # RackStatsD::ProcessUtilization takes "domain", "git rev sha", and options hash as args. current_sha = `git rev-parse HEAD`[0..8] Rails.application.middleware.insert_before Rack::Lock, RackStatsD::ProcessUtilization, 'rails-application', current_sha Friday, August 2, 13
  49. rack-statsd gem dczarnecki@production-webserver-1:~$ ps -ef | grep unicorn www-data unicorn_rails

    master -c config/unicorn.rb -E production -D www-data unicorn rails-application[251486965] worker[00]: 65 reqs, 0.0 req/s, 131ms avg, 0.0% util www-data unicorn rails-application[251486965] worker[01]: 52 reqs, 0.0 req/s, 167ms avg, 0.0% util www-data unicorn rails-application[251486965] worker[02]: 54 reqs, 0.0 req/s, 70ms avg, 0.0% util www-data unicorn rails-application[251486965] worker[03]: 54 reqs, 0.0 req/s, 67ms avg, 0.0% util www-data unicorn rails-application[251486965] worker[04]: 88 reqs, 0.0 req/s, 21ms avg, 0.0% util www-data unicorn rails-application[251486965] worker[05]: 46 reqs, 0.0 req/s, 64ms avg, 0.0% util Friday, August 2, 13
  50. what about migrations? Friday, August 2, 13

  51. migrations LMAO, do you even NoSQL?* Friday, August 2, 13

  52. migrations LMAO, do you even NoSQL?* *trololololololol Friday, August 2,

    13
  53. actual downtime large data sets are large Friday, August 2,

    13
  54. actual downtime perform schema changes Friday, August 2, 13

  55. actual downtime communicate your progress Friday, August 2, 13

  56. actual downtime have a rollback plan Friday, August 2, 13

  57. zero downtime? RTFM on your backend Friday, August 2, 13

  58. zero downtime? postgresql concurrent indexing Friday, August 2, 13

  59. zero downtime “shadow” production environment Friday, August 2, 13

  60. zero downtime LMAO, do you even Death Star? Friday, August

    2, 13
  61. zero downtime deploy to the “shadow” Friday, August 2, 13

  62. zero downtime swap “shadow” environment Friday, August 2, 13

  63. thank you everyone @btvrubyconf questions? @czarneckid Friday, August 2, 13