Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sensu: A cloud & CM-ready monitoring framework

Sensu: A cloud & CM-ready monitoring framework

Presentation to DevOpsDC from May 2012

pburkholder

May 24, 2012
Tweet

Other Decks in Technology

Transcript

  1. Nagios API-1 API NRPE LAMP-1 httpd NRPE Nagios Nagios check_api

    8443 check_nrpe -c disk Sunday, May 13, 12
  2. Nagios primitives • Services • Hosts • ServiceGroups • HostGroups

    • Dependencies, Commands, Contacts, ... Sunday, May 13, 12
  3. Puppet + Nagios • Node comes up as Puppet client

    w/ ‘role’ • Puppet stashes facts in storeconfig DB • Nagios puppet run • ‘exported resources’=>‘hosts.cfg’ • host is member of hostgroups: generic, role • services are monitored across hostgroups Sunday, May 13, 12
  4. fail • storeconfig • new nodes ... Nagios server lag

    • old nodes ... No API to del from DB • new roles => new hostgroup => fail Sunday, May 13, 12
  5. Architecture • RabbitMQ AMQP message bus • sensu-server (Ruby) +

    Redis k/v store • sensu-client • sensu-api • sensu-dashboard Sunday, May 13, 12
  6. sensu-server • sensu-server (Ruby) and Redis (C) • JSON configuration

    • /etc/sensu/config.json (main config) • /etc/sensu/conf.d/ (JSON snippets) Sunday, May 13, 12
  7. { "rabbitmq": { "host": "<%= rabbitmq_host %>", "port": <%= rabbitmq_port

    %> }, "redis": { "host": "<%= redis_host %>", "port": <%= redis_port %> }, "api": { "host": "<%= api_host %>", "port": <%= api_port %> }, } sensu-server Sunday, May 13, 12
  8. { "rabbitmq": { "host": "<%= rabbitmq_host %>", "port": <%= rabbitmq_port

    %> }, "api": { "host": "<%= api_host %>", "port": <%= api_port %> }, "client": { "name": "<%= sensu_hostname %>", "address": "<%= ipaddress %>", "subscriptions": ["generic", "cvapi"] } } sensu-client Sunday, May 13, 12
  9. { "checks": { "careverge_api": { "handlers": ["irc", "mailer" ], "notification":

    "Careverge API is not responding appropriately", "command": "/etc/sensu/plugins/local/check_cvapi.sh -S", "subscribers": [ "cvapi" ], "interval": 30, "refresh": 600 } } } checks Sunday, May 13, 12
  10. How it works • server publishes ‘check-api’ to ‘cvapi’ •

    some clients subscribe ‘cvapi’ • run check • publish result • server processes results, passes to handlers Sunday, May 13, 12
  11. Notification Handlers • subclassed from Sensu::Handler • distributed as .rb

    scripts with .json config • community: • mail, irc, hipchat, campfire, pagerDuty, twitter Sunday, May 13, 12
  12. API • thin/sinatra on port 4567 • GET/PUT/POST/DELETE k/v in

    Redis and • make check requests • Very handy, for, say... Sunday, May 13, 12
  13. Dropping a Node • sensu-client publishes keep-alive • On orderly

    termination: json = File.read(config_file) client_name = JSON.parse(json)['client']['name'] api_host = JSON.parse(json)['api']['host'] uri = URI.parse("http://#{api_host}/client/ #{client_name}") http = Net::HTTP.new(uri.host, uri.port) http.request( Net::HTTP::Delete.new(uri.path) ) Sunday, May 13, 12
  14. So Far... • Components: RabbitMQ, Redis, Ruby • sensu-server: •

    pubs check requests • pushes results to handlers • sensu-client: perform checks, pushes results • sensu-api, sensu-dashboard • JSON configuration • Plugins, Handlers, Keep-Alives Sunday, May 13, 12
  15. Metric Handlers • E.g. ‘vmstat_metrics’ plugin returns: • Define a

    check as a ‘type: metric’ • Add to a subscription stats.sensu-server.swap.in 0 1336502402 stats.sensu-server.swap.out 0 1336502402 stats.sensu-server.memory.cache 1408388 1336502402 stats.sensu-server.memory.swap_used 0 1336502402 stats.sensu-server.memory.free 5492292 1336502402 Sunday, May 13, 12
  16. Metric Handlers • ‘type: metric’ is always passed to hander

    • On server, use a ‘graphite’ handler • Feeds to Graphite over TCP or AMQP Sunday, May 13, 12
  17. But wait, there’s more... • Metrics integration (Graphite, Librato) •

    Application Integration (port 2030) • Standalone Checks • Parameter Passing • Scheduling Downtime • Sensu and Puppet/Chef Sunday, May 13, 12
  18. What’s Happening • Sensu is great at adapting to changes

    in your operating environment • Notifies effectively across various media • Lacks: • Tactical dashboard • Notification Hours, Contact Groups Sunday, May 13, 12
  19. What’s Happened • Metrics integration with Graphite, Librato, Geckoboard •

    Applications can fire-and-forget to UDP port 2030 • Lacks: • Uptime History • Notification History Sunday, May 13, 12
  20. Bear in Mind • Not even a toddler (Nov 2011

    open-source) • Active Community • Traction Sunday, May 13, 12
  21. For more: • GitHub repo and wiki: http://github.com/sensu • Joe

    Miller’s excellent blog series: • http://joemiller.me/category/sensu/ • IRC Channel: irc://irc.freenode.net/#sensu • My interview with Sean Porter on Sensu: • http://bit.ly/zGZhjg Sunday, May 13, 12