Slide 1

Slide 1 text

Sensu: The Monitoring Router Peter Burkholder @pburkholder DevOpsDC MeetUp 8 May 2012 Sunday, May 13, 12

Slide 2

Slide 2 text

Monitoring • #monitoringsucks • Or is it idiosyncratic? Sunday, May 13, 12

Slide 3

Slide 3 text

What’s Happening? Sunday, May 13, 12

Slide 4

Slide 4 text

What’s Happened? Sunday, May 13, 12

Slide 5

Slide 5 text

The Problem Sunday, May 13, 12

Slide 6

Slide 6 text

Careverge Sunday, May 13, 12

Slide 7

Slide 7 text

Careverge Prod Exp Dev QA O O X Sunday, May 13, 12

Slide 8

Slide 8 text

Nagios API-1 API NRPE LAMP-1 httpd NRPE Nagios Nagios check_api 8443 check_nrpe -c disk Sunday, May 13, 12

Slide 9

Slide 9 text

Nagios primitives • Services • Hosts • ServiceGroups • HostGroups • Dependencies, Commands, Contacts, ... Sunday, May 13, 12

Slide 10

Slide 10 text

Puppet + Nagios • Node comes up as Puppet client w/ ‘role’ • Puppet stashes facts in storeconfig DB • Nagios puppet run • ‘exported resources’=>‘hosts.cfg’ • host is member of hostgroups: generic, role • services are monitored across hostgroups Sunday, May 13, 12

Slide 11

Slide 11 text

fail Sunday, May 13, 12

Slide 12

Slide 12 text

fail • storeconfig • new nodes ... Nagios server lag • old nodes ... No API to del from DB • new roles => new hostgroup => fail Sunday, May 13, 12

Slide 13

Slide 13 text

Sensu Sunday, May 13, 12

Slide 14

Slide 14 text

Architecture • RabbitMQ AMQP message bus • sensu-server (Ruby) + Redis k/v store • sensu-client • sensu-api • sensu-dashboard Sunday, May 13, 12

Slide 15

Slide 15 text

sensu-mq • RabbitMQ • Sonian scales to 500-1000 nodes with 1 EC2 instance Sunday, May 13, 12

Slide 16

Slide 16 text

sensu-server • sensu-server (Ruby) and Redis (C) • JSON configuration • /etc/sensu/config.json (main config) • /etc/sensu/conf.d/ (JSON snippets) Sunday, May 13, 12

Slide 17

Slide 17 text

{ "rabbitmq": { "host": "<%= rabbitmq_host %>", "port": <%= rabbitmq_port %> }, "redis": { "host": "<%= redis_host %>", "port": <%= redis_port %> }, "api": { "host": "<%= api_host %>", "port": <%= api_port %> }, } sensu-server Sunday, May 13, 12

Slide 18

Slide 18 text

{ "rabbitmq": { "host": "<%= rabbitmq_host %>", "port": <%= rabbitmq_port %> }, "api": { "host": "<%= api_host %>", "port": <%= api_port %> }, "client": { "name": "<%= sensu_hostname %>", "address": "<%= ipaddress %>", "subscriptions": ["generic", "cvapi"] } } sensu-client Sunday, May 13, 12

Slide 19

Slide 19 text

One config.json to rule them all Sunday, May 13, 12

Slide 20

Slide 20 text

API-1 API client LAMP-1 httpd client sensu sensu- server RabbitMQ Sunday, May 13, 12

Slide 21

Slide 21 text

{ "checks": { "careverge_api": { "handlers": ["irc", "mailer" ], "notification": "Careverge API is not responding appropriately", "command": "/etc/sensu/plugins/local/check_cvapi.sh -S", "subscribers": [ "cvapi" ], "interval": 30, "refresh": 600 } } } checks Sunday, May 13, 12

Slide 22

Slide 22 text

How it works • server publishes ‘check-api’ to ‘cvapi’ • some clients subscribe ‘cvapi’ • run check • publish result • server processes results, passes to handlers Sunday, May 13, 12

Slide 23

Slide 23 text

Works almost too well Sunday, May 13, 12

Slide 24

Slide 24 text

Notification Handlers • subclassed from Sensu::Handler • distributed as .rb scripts with .json config • community: • mail, irc, hipchat, campfire, pagerDuty, twitter Sunday, May 13, 12

Slide 25

Slide 25 text

API • thin/sinatra on port 4567 • GET/PUT/POST/DELETE k/v in Redis and • make check requests • Very handy, for, say... Sunday, May 13, 12

Slide 26

Slide 26 text

Dropping a Node • sensu-client publishes keep-alive • On orderly termination: json = File.read(config_file) client_name = JSON.parse(json)['client']['name'] api_host = JSON.parse(json)['api']['host'] uri = URI.parse("http://#{api_host}/client/ #{client_name}") http = Net::HTTP.new(uri.host, uri.port) http.request( Net::HTTP::Delete.new(uri.path) ) Sunday, May 13, 12

Slide 27

Slide 27 text

sensu-dashboard Sunday, May 13, 12

Slide 28

Slide 28 text

sensu-dashboard Sunday, May 13, 12

Slide 29

Slide 29 text

So Far... • Components: RabbitMQ, Redis, Ruby • sensu-server: • pubs check requests • pushes results to handlers • sensu-client: perform checks, pushes results • sensu-api, sensu-dashboard • JSON configuration • Plugins, Handlers, Keep-Alives Sunday, May 13, 12

Slide 30

Slide 30 text

• What’s Happening? • What’s Happened? Sunday, May 13, 12

Slide 31

Slide 31 text

Metric Handlers • E.g. ‘vmstat_metrics’ plugin returns: • Define a check as a ‘type: metric’ • Add to a subscription stats.sensu-server.swap.in 0 1336502402 stats.sensu-server.swap.out 0 1336502402 stats.sensu-server.memory.cache 1408388 1336502402 stats.sensu-server.memory.swap_used 0 1336502402 stats.sensu-server.memory.free 5492292 1336502402 Sunday, May 13, 12

Slide 32

Slide 32 text

Metric Handlers • ‘type: metric’ is always passed to hander • On server, use a ‘graphite’ handler • Feeds to Graphite over TCP or AMQP Sunday, May 13, 12

Slide 33

Slide 33 text

But wait, there’s more... • Metrics integration (Graphite, Librato) • Application Integration (port 2030) • Standalone Checks • Parameter Passing • Scheduling Downtime • Sensu and Puppet/Chef Sunday, May 13, 12

Slide 34

Slide 34 text

What’s Happening? Sunday, May 13, 12

Slide 35

Slide 35 text

What’s Happened? Sunday, May 13, 12

Slide 36

Slide 36 text

What’s Happening • Sensu is great at adapting to changes in your operating environment • Notifies effectively across various media • Lacks: • Tactical dashboard • Notification Hours, Contact Groups Sunday, May 13, 12

Slide 37

Slide 37 text

What’s Happened • Metrics integration with Graphite, Librato, Geckoboard • Applications can fire-and-forget to UDP port 2030 • Lacks: • Uptime History • Notification History Sunday, May 13, 12

Slide 38

Slide 38 text

Bear in Mind • Not even a toddler (Nov 2011 open-source) • Active Community • Traction Sunday, May 13, 12

Slide 39

Slide 39 text

For more: • GitHub repo and wiki: http://github.com/sensu • Joe Miller’s excellent blog series: • http://joemiller.me/category/sensu/ • IRC Channel: irc://irc.freenode.net/#sensu • My interview with Sean Porter on Sensu: • http://bit.ly/zGZhjg Sunday, May 13, 12

Slide 40

Slide 40 text

fini Sunday, May 13, 12