Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chef and Sensu - Delightful Monitoring

Chef and Sensu - Delightful Monitoring

Chef makes it easier for organizations to go fast–enabling them to continuously deliver new services and features to their customers. With Chef, you can quickly provision infrastructure and deploy applications, but can your monitoring keep up? Or will it keep you up at night?

Sean will demonstrate how Sensu 2.0 is designed to monitor Chef driven infrastructure. Sensu 2.0 is the next release of the open source monitoring framework, rewritten in Go, with new capabilities and reduced operational overhead. Sean will show how the Sensu Chef cookbook can be used to deploy Sensu and manage service checks, metric collection, and alert notifications. He will go over monitoring best practices that lead to delightful monitoring with Chef and Sensu.

portertech

May 24, 2018
Tweet

More Decks by portertech

Other Decks in Technology

Transcript

  1. sensu_agent "chefconf" do version "2.0.0.beta.1-1" repo "sensu/beta" config_home "/etc/sensu" config

    { "organization": "acme", "environment": "production", "backend-url": ["ws://backend:8081"], "subscriptions": ["mysql"] } end
  2. check_mysql -H localhost -P 3360 Uptime: 798 Threads: 1 Questions:

    5 Slow queries: 0 Opens: 107 Flush tables: 1 Open tables: 26 Queries per second avg: 0.006|Connections=9c;;; Open_files=6;;; Open_tables=27;;; Qcache_free_memory=16760152;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=1c;;; Qcache_queries_in_cache=0;;; Queries=6c;;; Questions=4c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=798c;;; Exit 0 (OK) Service Checks
  3. check_mysql -H localhost -P 3360 Can't connect to MySQL server

    on 'localhost' Exit 2 (CRITICAL) Service Checks
  4. { timestamp: 1516663186, entity: { … }, check: { command:

    "check_mysql -H ...", output: "Can’t connect ... ", status: 2, … }, metrics: { ... } }
  5. check_mysql -H localhost -P 3360 Uptime: 798 Threads: 1 Questions:

    5 Slow queries: 0 Opens: 107 Flush tables: 1 Open tables: 26 Queries per second avg: 0.006|Connections=9c;;; Open_files=6;;; Open_tables=27;;; Qcache_free_memory=16760152;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=1c;;; Qcache_queries_in_cache=0;;; Queries=6c;;; Questions=4c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=798c;;; Exit 0 (OK) Service Checks
  6. sensu_check "mysql" do command "check_mysql -H ..." subscriptions ["mysql"] interval

    20 timeout 10 handlers [ … ] output_metric_format "nagios_perfdata" output_metric_handlers [ … ] end
  7. { timestamp: 1516663186, entity: { … }, check: { …

    }, metrics: { handlers: [ … ], points: [{ name: "mysql.connections", value: 9, tags: [ … ] }, … ] } }
  8. sensu_check "mysql" do command "check_mysql -H ..." runtime_assets ["mysql-plugins"] subscriptions

    ["mysql"] interval 20 timeout 10 handlers [ … ] output_metric_format "nagios_perfdata" output_metric_handlers [ … ] end
  9. • REST API (Agent & Backend) • Entity management •

    External checks • Metrics Events API
  10. { timestamp: 1516663186, entity: { … }, check: { output:

    "Backup failed ... ", status: 2, ttl: 21600, … }, metrics: { … } }
  11. { timestamp: 1516663186, entity: { … }, check: { …

    }, metrics: { handlers: [ … ], points: [{ name: "logins", value: 2, tags: [ … ] }, … ] } }
  12. • Agent listeners (TCP & UDP) • Stats aggregation •

    Gauges, counters, etc. • Protocol enhancements (tags) StatsD
  13. { timestamp: 1516663186, entity: { … }, check: { …

    }, metrics: { handlers: [ … ], points: [{ name: "request_count", value: 42, tags: [ … ] }, … ] } }
  14. sensu_check "mysql" do command "check_mysql -H ..." runtime_assets ["mysql-plugins"] subscriptions

    ["mysql"] interval 20 timeout 10 handlers ["slack"] output_metric_format "nagios_perfdata" output_metric_handlers ["influxdb"] end
  15. +