Slide 1

Slide 1 text

od process and task monitoring done right Jesse Newland jnewland.com jesse@railsmachine.com g

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

FAILWHALE NEEDS NO INTRODUCTION

Slide 4

Slide 4 text

Like it or not, the web is 24/7/365

Slide 5

Slide 5 text

But who wants to be online 24/7/365?

Slide 6

Slide 6 text

Sometimes, you’ve just gotta take a walk

Slide 7

Slide 7 text

ZOMG WHAT NOW?

Slide 8

Slide 8 text

Process monitoring

Slide 9

Slide 9 text

sudo gem install god

Slide 10

Slide 10 text

Tom Preston- Warner written by:

Slide 11

Slide 11 text

git clone git://github.com/jnewland/god_examples.git Follow along at home

Slide 12

Slide 12 text

The Basics

Slide 13

Slide 13 text

$ ruby scripts/crashy.rb Wed Jul 09 13:53:13 -0400 2008 Wed Jul 09 13:53:14 -0400 2008 Wed Jul 09 13:53:15 -0400 2008 /Users/jnewland/src/god_examples/lib/god_test.rb:28:in `crash': Crash! (RuntimeError) from /Users/jnewland/src/god_examples/lib/god_test.rb:20:in `run' from /Users/jnewland/src/god_examples/lib/god_test.rb:19:in `loop' from /Users/jnewland/src/god_examples/lib/god_test.rb:19:in `run' from /Users/jnewland/src/god_examples/lib/god_test.rb:15:in `initialize' from scripts/crashy.rb:4:in `new' from scripts/crashy.rb:4

Slide 14

Slide 14 text

#simple.god #The simplest possible watch God.watch do |w| w.name = 'crashy' w.interval = 1.seconds w.start = 'ruby scripts/crashy.rb' w.start_if do |start| start.condition(:process_running) do |c| c.running = false end end end

Slide 15

Slide 15 text

$ god -h ... Options: -c, --config-file CONFIG Configuration file -p, --port PORT Communications port (default 17165) -b, --auto-bind Auto-bind to an unused port number -P, --pid FILE Where to write the PID file -l, --log FILE Where to write the log file -D, --no-daemonize Don't daemonize -v, --version Print the version number and exit

Slide 16

Slide 16 text

$ god -c simple.god -D [... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids [... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock [... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up' [... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up' [... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning) [... 20:19:34 #10897] INFO: crashy move 'up' to 'start' [... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb [... 20:19:34 #10897] INFO: crashy moved 'up' to 'up' [... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning) [... 20:19:40 #10897] INFO: crashy move 'up' to 'start' [... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb [... 20:19:40 #10897] INFO: crashy moved 'up' to 'up' [... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

Slide 17

Slide 17 text

$ god -c simple.god -D [... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids [... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock [... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up' [... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up' [... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning) [... 20:19:34 #10897] INFO: crashy move 'up' to 'start' [... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb [... 20:19:34 #10897] INFO: crashy moved 'up' to 'up' [... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning) [... 20:19:40 #10897] INFO: crashy move 'up' to 'start' [... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb [... 20:19:40 #10897] INFO: crashy moved 'up' to 'up' [... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

Slide 18

Slide 18 text

$ god -c simple.god -D [... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids [... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock [... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up' [... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up' [... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning) [... 20:19:34 #10897] INFO: crashy move 'up' to 'start' [... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb [... 20:19:34 #10897] INFO: crashy moved 'up' to 'up' [... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning) [... 20:19:40 #10897] INFO: crashy move 'up' to 'start' [... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb [... 20:19:40 #10897] INFO: crashy moved 'up' to 'up' [... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

Slide 19

Slide 19 text

$ god -c simple.god -D [... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids [... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock [... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up' [... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up' [... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning) [... 20:19:34 #10897] INFO: crashy move 'up' to 'start' [... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb [... 20:19:34 #10897] INFO: crashy moved 'up' to 'up' [... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning) [... 20:19:40 #10897] INFO: crashy move 'up' to 'start' [... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb [... 20:19:40 #10897] INFO: crashy moved 'up' to 'up' [... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

Slide 20

Slide 20 text

$ god -c simple.god -D [... 20:19:33 #10897] INFO: Using pid file directory: /Users/jnewland/.god/pids [... 20:19:34 #10897] INFO: Started on drbunix:///tmp/god.17165.sock [... 20:19:34 #10897] INFO: crashy move 'unmonitored' to 'up' [... 20:19:34 #10897] INFO: crashy moved 'unmonitored' to 'up' [... 20:19:34 #10897] INFO: crashy [trigger] process is not running (ProcessRunning) [... 20:19:34 #10897] INFO: crashy move 'up' to 'start' [... 20:19:34 #10897] INFO: crashy start: ruby scripts/crashy.rb [... 20:19:34 #10897] INFO: crashy moved 'up' to 'up' [... 20:19:34 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:35 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:36 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:37 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:38 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:39 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:40 #10897] INFO: crashy [trigger] process is not running (ProcessRunning) [... 20:19:40 #10897] INFO: crashy move 'up' to 'start' [... 20:19:40 #10897] INFO: crashy start: ruby scripts/crashy.rb [... 20:19:40 #10897] INFO: crashy moved 'up' to 'up' [... 20:19:40 #10897] INFO: crashy [ok] process is running (ProcessRunning) [... 20:19:41 #10897] INFO: crashy [ok] process is running (ProcessRunning)

Slide 21

Slide 21 text

$ god -c simple.god $

Slide 22

Slide 22 text

$ god -c simple.god $ ps ax | grep ruby 12512 ?? Ss 0:00.03 ruby /Users/jnewland/src/god_examples/scripts/crashy.rb 12484 s001 S 0:00.36 /usr/bin/ruby /usr/bin/god -c simple.god

Slide 23

Slide 23 text

$ god -c simple.god $ ps ax | grep ruby 12512 ?? Ss 0:00.03 ruby /Users/jnewland/src/god_examples/scripts/crashy.rb 12484 s001 S 0:00.36 /usr/bin/ruby /usr/bin/god -c simple.god $ god -h ... Commands: start start task or group restart restart task or group stop stop task or group monitor monitor task or group unmonitor unmonitor task or group remove remove task or group from god load load a config into a running god log show realtime log for given task status show status of each task quit stop god terminate stop god and all tasks check run self diagnostic

Slide 24

Slide 24 text

$ god status crashy: up $ god restart crashy Sending 'restart' command The following watches were affected: crashy $ god stop crashy Sending 'stop' command The following watches were affected: crashy $ god status crashy: unmonitored $ god start crashy Sending 'start' command The following watches were affected: crashy $ god status crashy: up

Slide 25

Slide 25 text

Controlling Leaky Processes

Slide 26

Slide 26 text

#leaky.god God.watch do |w| w.name = "leaky" w.interval = 5.seconds w.start = 'ruby scripts/leaky.rb' w.start_if do |start| start.condition(:process_running) do |c| c.running = false end end w.restart_if do |restart| restart.condition(:memory_usage) do |c| c.above = 2.megabytes end end end

Slide 27

Slide 27 text

CPU Usage

Slide 28

Slide 28 text

w.restart_if do |restart| restart.condition(:cpu_usage) do |c| c.above = 50.percent c.times = [3, 5] end end

Slide 29

Slide 29 text

HTTP Status Codes

Slide 30

Slide 30 text

w.restart_if do |restart| restart.condition(:http_response_code) do |c| c.host = 'localhost' c.port = '80' c.path = '/heartbeat' c.code_is_not = %w(200 304) end end

Slide 31

Slide 31 text

Notifications

Slide 32

Slide 32 text

#email_contacts.god God::Contacts::Email.message_settings = { :from => 'god@jnewland.com' } God::Contacts::Email.server_settings = { :address => "smtp.jnewland.com", :port => 25, :domain => "jnewland.com", :authentication => :plain, :user_name => "god", :password => "" } God.contact(:email) do |c| c.name = 'jesse' c.email = 'jnewland@gmail.com' end

Slide 33

Slide 33 text

#http://github.com/mojombo/god/tree/master/lib/god/contacts/jabber.rb require 'jabber' God::Contacts::Jabber.settings = { :jabber_id => 'bot@jnewland.com', :password => ' ' } God.contact(:jabber) do |c| c.name = 'jesse' c.jabber_id = 'jnewland@gmail.com' end

Slide 34

Slide 34 text

w.restart_if do |restart| restart.condition(:cpu_usage) do |c| c.above = 50.percent c.times = [3, 5] c.notify = "jesse" end end

Slide 35

Slide 35 text

Monitoring Mongrels

Slide 36

Slide 36 text

Putting it all together • Process Running • Memory Usage • CPU Usage • HTTP Response Code • Notifications • Capistrano? • Web Interface?

Slide 37

Slide 37 text

#rails/config/god/app.god RAILS_ROOT = ENV['RAILS_ROOT'] ||= "/var/www/apps/test/current" RUBY = `which ruby`.chomp MONGREL_RAILS = `which mongrel_rails`.chomp RAILS_ENV = ENV['RAILS_ENV'] ||= 'production' MONGRELS = 2 MONGREL_START_PORT= 3000 USER = GROUP = 'deploy' 0.upto(MONGRELS-1) do |n| port = MONGREL_START_PORT+n God.watch do |w| w.group = 'mongrels' w.name = "mongrel_#{port}" w.uid = USER w.gid = GROUP w.interval = 30.seconds w.start = "#{RUBY} #{MONGREL_RAILS} start --environment #{RAILS_ENV} -- chdir #{RAILS_ROOT} --port #{port}" w.start_grace = 90.seconds w.restart_grace = 90.seconds w.log = File.join(RAILS_ROOT, "log/mongrel_#{port}.log") #process running #memory usage #cpu usage #http response code end do

Slide 38

Slide 38 text

class PulseController < ApplicationController session :off def pulse if (ActiveRecord::Base.connection.execute("select 1").num_rows rescue 0) == 1 render :text => "OK #{Time.now.utc.to_s(:db)}" else render :text => 'ERROR', :status => :internal_server_error end end end Pulse Controller

Slide 39

Slide 39 text

Capistrano

Slide 40

Slide 40 text

#rails/config/deploy.rb role :app, "test.jnewland.com" require 'san_juan' san_juan.role :app, %w(mongrels) #overwrite the default start, stop, and restart tasks to use god namespace :deploy do desc "Use god to restart the app" task :restart do god.all.reload god.app.mongrels.restart end desc "Use god to start the app" task :start do god.all.start end desc "Use god to stop the app" task :stop do god.all.terminate end end

Slide 41

Slide 41 text

$ cap -T ... cap god:all:quit # Quit god, but not the processes it's monitoring cap god:all:reload # Reloading God Config cap god:all:start # Start god cap god:all:start_interactive # Start god interactively cap god:all:status # Describe the status of the running tasks on ... cap god:all:terminate # Terminate god and all monitored processes cap god:app:mongrels:log # Log mongrels cap god:app:mongrels:remove # Remove mongrels cap god:app:mongrels:restart # Restart mongrels cap god:app:mongrels:start # Start mongrels cap god:app:mongrels:stop # Stop mongrels cap god:app:mongrels:unmonitor # Unmonitor mongrels cap god:app:quit # Quit god, but not the processes it's monitoring cap god:app:reload # Reload the god config file cap god:app:start # Start god cap god:app:start_interactive # Start god interactively cap god:app:status # Describe the status of the running tasks cap god:app:terminate # Terminate god and all monitored processes ...

Slide 42

Slide 42 text

http://github.com/jnewland/san_juan

Slide 43

Slide 43 text

ZOMG WHAT NOW?

Slide 44

Slide 44 text

#rails/config/god/app.god ... require 'god_web' GodWeb.watch(:port => 3003) ...

Slide 45

Slide 45 text

No content

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

http://github.com/jnewland/god_web

Slide 48

Slide 48 text

Advanced Features

Slide 49

Slide 49 text

#jabber_bot.god w.restart_if do |restart| restart.condition(:lambda) do |c| c.interval = 15.seconds c.lambda = lambda do require 'xmpp4r-simple' im = Jabber::Simple.new( 'god@jnewland.com', PASSWORDS['god@jnewland.com'] ) im.deliver('bot@jnewland.com', 'ping') sleep(5) return true unless im.received_messages? chat = im.received_messages.find { |msg| msg.type == :chat} return true unless chat.body =~ /pong/ end end end Lambda Conditions

Slide 50

Slide 50 text

#custom_behavior.god module God module Behaviors class Speak < Behavior def before_start `say "Starting now"` 'announced start' end def before_stop `say "Stopping now"` 'announced stop' end end end end God.watch do |w| ... w.behavior(:speak) ... end Behaviors

Slide 51

Slide 51 text

#mongrel_cluster.god require 'lib/god_mongrel_cluster' Dir.glob('/etc/mongrel_cluster/*.conf').each do |mongrel_cluster| cluster = GodMongrelCluster.new(mongrel_cluster) cluster.watch end mongrel_cluster

Slide 52

Slide 52 text

Questions?

Slide 53

Slide 53 text

http://www.flickr.com/photos/stuckincustoms/522313332/ http://www.flickr.com/photos/91499534@N00/2335651912/ http://www.flickr.com/photos/code_martial/1411893703/ http://www.flickr.com/photos/extranoise/163847669/ http://www.flickr.com/photos/vanz/2480741207/ http://www.flickr.com/photos/smartjunco/281071006/ http://www.flickr.com/photos/davesag/8312984/ http://www.flickr.com/photos/gaetanlee/298178764/ http://www.flickr.com/photos/vrogy/511644410/ http://www.flickr.com/photos/jeffsmallwood/299208539/ http://www.flickr.com/photos/cjdaniel/2240123159/ http://www.flickr.com/photos/bobbygreg/139080175/ http://www.flickr.com/photos/lordelo/12958772/ Hooray Flickr! (And Creative Commons)

Slide 54

Slide 54 text

http://creativecommons.org/licenses/by-sa/2.0/deed.en