$30 off During Our Annual Pro Sale. View Details »

Sensu: A cloud & CM-ready monitoring framework

Sensu: A cloud & CM-ready monitoring framework

Presentation to DevOpsDC from May 2012

pburkholder

May 24, 2012
Tweet

Other Decks in Technology

Transcript

  1. Sensu: The Monitoring Router Peter Burkholder @pburkholder DevOpsDC MeetUp 8

    May 2012 Sunday, May 13, 12
  2. Monitoring • #monitoringsucks • Or is it idiosyncratic? Sunday, May

    13, 12
  3. What’s Happening? Sunday, May 13, 12

  4. What’s Happened? Sunday, May 13, 12

  5. The Problem Sunday, May 13, 12

  6. Careverge Sunday, May 13, 12

  7. Careverge Prod Exp Dev QA O O X Sunday, May

    13, 12
  8. Nagios API-1 API NRPE LAMP-1 httpd NRPE Nagios Nagios check_api

    8443 check_nrpe -c disk Sunday, May 13, 12
  9. Nagios primitives • Services • Hosts • ServiceGroups • HostGroups

    • Dependencies, Commands, Contacts, ... Sunday, May 13, 12
  10. Puppet + Nagios • Node comes up as Puppet client

    w/ ‘role’ • Puppet stashes facts in storeconfig DB • Nagios puppet run • ‘exported resources’=>‘hosts.cfg’ • host is member of hostgroups: generic, role • services are monitored across hostgroups Sunday, May 13, 12
  11. fail Sunday, May 13, 12

  12. fail • storeconfig • new nodes ... Nagios server lag

    • old nodes ... No API to del from DB • new roles => new hostgroup => fail Sunday, May 13, 12
  13. Sensu Sunday, May 13, 12

  14. Architecture • RabbitMQ AMQP message bus • sensu-server (Ruby) +

    Redis k/v store • sensu-client • sensu-api • sensu-dashboard Sunday, May 13, 12
  15. sensu-mq • RabbitMQ • Sonian scales to 500-1000 nodes with

    1 EC2 instance Sunday, May 13, 12
  16. sensu-server • sensu-server (Ruby) and Redis (C) • JSON configuration

    • /etc/sensu/config.json (main config) • /etc/sensu/conf.d/ (JSON snippets) Sunday, May 13, 12
  17. { "rabbitmq": { "host": "<%= rabbitmq_host %>", "port": <%= rabbitmq_port

    %> }, "redis": { "host": "<%= redis_host %>", "port": <%= redis_port %> }, "api": { "host": "<%= api_host %>", "port": <%= api_port %> }, } sensu-server Sunday, May 13, 12
  18. { "rabbitmq": { "host": "<%= rabbitmq_host %>", "port": <%= rabbitmq_port

    %> }, "api": { "host": "<%= api_host %>", "port": <%= api_port %> }, "client": { "name": "<%= sensu_hostname %>", "address": "<%= ipaddress %>", "subscriptions": ["generic", "cvapi"] } } sensu-client Sunday, May 13, 12
  19. One config.json to rule them all Sunday, May 13, 12

  20. API-1 API client LAMP-1 httpd client sensu sensu- server RabbitMQ

    Sunday, May 13, 12
  21. { "checks": { "careverge_api": { "handlers": ["irc", "mailer" ], "notification":

    "Careverge API is not responding appropriately", "command": "/etc/sensu/plugins/local/check_cvapi.sh -S", "subscribers": [ "cvapi" ], "interval": 30, "refresh": 600 } } } checks Sunday, May 13, 12
  22. How it works • server publishes ‘check-api’ to ‘cvapi’ •

    some clients subscribe ‘cvapi’ • run check • publish result • server processes results, passes to handlers Sunday, May 13, 12
  23. Works almost too well Sunday, May 13, 12

  24. Notification Handlers • subclassed from Sensu::Handler • distributed as .rb

    scripts with .json config • community: • mail, irc, hipchat, campfire, pagerDuty, twitter Sunday, May 13, 12
  25. API • thin/sinatra on port 4567 • GET/PUT/POST/DELETE k/v in

    Redis and • make check requests • Very handy, for, say... Sunday, May 13, 12
  26. Dropping a Node • sensu-client publishes keep-alive • On orderly

    termination: json = File.read(config_file) client_name = JSON.parse(json)['client']['name'] api_host = JSON.parse(json)['api']['host'] uri = URI.parse("http://#{api_host}/client/ #{client_name}") http = Net::HTTP.new(uri.host, uri.port) http.request( Net::HTTP::Delete.new(uri.path) ) Sunday, May 13, 12
  27. sensu-dashboard Sunday, May 13, 12

  28. sensu-dashboard Sunday, May 13, 12

  29. So Far... • Components: RabbitMQ, Redis, Ruby • sensu-server: •

    pubs check requests • pushes results to handlers • sensu-client: perform checks, pushes results • sensu-api, sensu-dashboard • JSON configuration • Plugins, Handlers, Keep-Alives Sunday, May 13, 12
  30. • What’s Happening? • What’s Happened? Sunday, May 13, 12

  31. Metric Handlers • E.g. ‘vmstat_metrics’ plugin returns: • Define a

    check as a ‘type: metric’ • Add to a subscription stats.sensu-server.swap.in 0 1336502402 stats.sensu-server.swap.out 0 1336502402 stats.sensu-server.memory.cache 1408388 1336502402 stats.sensu-server.memory.swap_used 0 1336502402 stats.sensu-server.memory.free 5492292 1336502402 Sunday, May 13, 12
  32. Metric Handlers • ‘type: metric’ is always passed to hander

    • On server, use a ‘graphite’ handler • Feeds to Graphite over TCP or AMQP Sunday, May 13, 12
  33. But wait, there’s more... • Metrics integration (Graphite, Librato) •

    Application Integration (port 2030) • Standalone Checks • Parameter Passing • Scheduling Downtime • Sensu and Puppet/Chef Sunday, May 13, 12
  34. What’s Happening? Sunday, May 13, 12

  35. What’s Happened? Sunday, May 13, 12

  36. What’s Happening • Sensu is great at adapting to changes

    in your operating environment • Notifies effectively across various media • Lacks: • Tactical dashboard • Notification Hours, Contact Groups Sunday, May 13, 12
  37. What’s Happened • Metrics integration with Graphite, Librato, Geckoboard •

    Applications can fire-and-forget to UDP port 2030 • Lacks: • Uptime History • Notification History Sunday, May 13, 12
  38. Bear in Mind • Not even a toddler (Nov 2011

    open-source) • Active Community • Traction Sunday, May 13, 12
  39. For more: • GitHub repo and wiki: http://github.com/sensu • Joe

    Miller’s excellent blog series: • http://joemiller.me/category/sensu/ • IRC Channel: irc://irc.freenode.net/#sensu • My interview with Sean Porter on Sensu: • http://bit.ly/zGZhjg Sunday, May 13, 12
  40. fini Sunday, May 13, 12