Chef and Sensu - Delightful Monitoring

Chef and Sensu - Delightful Monitoring

Chef makes it easier for organizations to go fast–enabling them to continuously deliver new services and features to their customers. With Chef, you can quickly provision infrastructure and deploy applications, but can your monitoring keep up? Or will it keep you up at night?

Sean will demonstrate how Sensu 2.0 is designed to monitor Chef driven infrastructure. Sensu 2.0 is the next release of the open source monitoring framework, rewritten in Go, with new capabilities and reduced operational overhead. Sean will show how the Sensu Chef cookbook can be used to deploy Sensu and manage service checks, metric collection, and alert notifications. He will go over monitoring best practices that lead to delightful monitoring with Chef and Sensu.

98f9dfc2e5e1318ac78b8c716582cd30?s=128

portertech

May 24, 2018
Tweet

Transcript

  1. Chef & Sensu – Delightful Monitoring Sean Porter CTO for

    Sensu
  2. • Sean Porter • Author of Sensu • CTO for

    Sensu Inc. • @portertech
  3. We solve our problems with technology.

  4. We create new problems with technology.

  5. Illustration by Fredrik Skarstedt

  6. HOST APP APP APP APP

  7. HOST VM VM APP APP APP APP

  8. HOST VM VM APP APP APP APP

  9. COMPLEXITY TIME

  10. Apps can span any number of technologies.

  11. None
  12. None
  13. None
  14. Design

  15. None
  16. None
  17. None
  18. None
  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. { timestamp: 1516663186, entity: { … }, check: { …

    }, metrics: { ... } }
  26. None
  27. None
  28. None
  29. None
  30. None
  31. • Backend REST API • RBAC (multi-tenancy) ◦ Organization ◦

    Environment Configuration
  32. None
  33. • sensuctl (CLI tool) • Dashboard • Chef Configuration

  34. The Cookbook The new “sensu-go” Chef Cookbook.

  35. • sensu/sensu-go-chef • Custom resources ◦ Sensu services ◦ Sensu

    object config Chef Cookbook
  36. sensu_backend "chefconf" do version "2.0.0.beta.1-1" repo "sensu/beta" config_home "/etc/sensu" config

    { "state-dir": "/var/lib/sensu" } end
  37. sensu_agent "chefconf" do version "2.0.0.beta.1-1" repo "sensu/beta" config_home "/etc/sensu" config

    { "organization": "acme", "environment": "production", "backend-url": ["ws://backend:8081"], "subscriptions": ["mysql"] } end
  38. None
  39. sensu_ctl "default" do version "2.0.0.beta.1-1" repo "sensu/beta" username secrets["username"] password

    secrets["password"] backend_url "ws://backend:8081" end
  40. 3 Methods The three methods of data collection with Sensu

    and example Chef resources.
  41. 1. Service Checks

  42. • Script • STDOUT (message and data) • Exit code

    (severity) Service Checks
  43. check_mysql -H localhost -P 3360 Uptime: 798 Threads: 1 Questions:

    5 Slow queries: 0 Opens: 107 Flush tables: 1 Open tables: 26 Queries per second avg: 0.006|Connections=9c;;; Open_files=6;;; Open_tables=27;;; Qcache_free_memory=16760152;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=1c;;; Qcache_queries_in_cache=0;;; Queries=6c;;; Questions=4c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=798c;;; Exit 0 (OK) Service Checks
  44. check_mysql -H localhost -P 3360 Can't connect to MySQL server

    on 'localhost' Exit 2 (CRITICAL) Service Checks
  45. { timestamp: 1516663186, entity: { … }, check: { command:

    "check_mysql -H ...", output: "Can’t connect ... ", status: 2, … }, metrics: { ... } }
  46. Symptoms

  47. None
  48. sensu_check "mysql" do command "check_mysql -H ..." subscriptions ["mysql"] interval

    20 timeout 10 handlers [ … ] end
  49. check_mysql -H localhost -P 3360 Uptime: 798 Threads: 1 Questions:

    5 Slow queries: 0 Opens: 107 Flush tables: 1 Open tables: 26 Queries per second avg: 0.006|Connections=9c;;; Open_files=6;;; Open_tables=27;;; Qcache_free_memory=16760152;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=1c;;; Qcache_queries_in_cache=0;;; Queries=6c;;; Questions=4c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=798c;;; Exit 0 (OK) Service Checks
  50. sensu_check "mysql" do command "check_mysql -H ..." subscriptions ["mysql"] interval

    20 timeout 10 handlers [ … ] output_metric_format "nagios_perfdata" output_metric_handlers [ … ] end
  51. { timestamp: 1516663186, entity: { … }, check: { …

    }, metrics: { handlers: [ … ], points: [{ name: "mysql.connections", value: 9, tags: [ … ] }, … ] } }
  52. None
  53. None
  54. sensu_asset "mysql-plugins" do url "https://…/mysql-plugins.tar.gz" sha512 "4e6f621ebe652d3b0ba5d4dea ..." organization "acme"

    end
  55. sensu_check "mysql" do command "check_mysql -H ..." runtime_assets ["mysql-plugins"] subscriptions

    ["mysql"] interval 20 timeout 10 handlers [ … ] output_metric_format "nagios_perfdata" output_metric_handlers [ … ] end
  56. • Simple • Accessible • Shareable • Legacy Service Checks

  57. 2. Events API

  58. • REST API (Agent & Backend) • Entity management •

    External checks • Metrics Events API
  59. POST /events { timestamp: 1516663186, entity: { … }, check:

    { … }, metrics: { ... } }
  60. { timestamp: 1516663186, entity: { name: "leviathan", class: "application", tags:

    [ … ], ... }, check: { … }, metrics: { … } }
  61. { timestamp: 1516663186, entity: { … }, check: { output:

    "Backup failed ... ", status: 2, ttl: 21600, … }, metrics: { … } }
  62. { timestamp: 1516663186, entity: { … }, check: { …

    }, metrics: { handlers: [ … ], points: [{ name: "logins", value: 2, tags: [ … ] }, … ] } }
  63. 3. StatsD

  64. • Agent listeners (TCP & UDP) • Stats aggregation •

    Gauges, counters, etc. • Protocol enhancements (tags) StatsD
  65. <name>:<value>|c[|@<sample rate>] UDP localhost 8125

  66. { timestamp: 1516663186, entity: { … }, check: { …

    }, metrics: { handlers: [ … ], points: [{ name: "request_count", value: 42, tags: [ … ] }, … ] } }
  67. • Service checks • Events API • StatsD 3 Methods

    Recap
  68. Event Processing Do things with the collected data.

  69. sensu_handler "slack" do type "pipe" command "handler-slack --webhook-url http..." timeout

    10 filters ["is_incident", "not_silenced"] end
  70. sensu_handler "influxdb" do type "pipe" command "handler-influx -a http..." timeout

    10 filters ["has_metrics"] end
  71. sensu_check "mysql" do command "check_mysql -H ..." runtime_assets ["mysql-plugins"] subscriptions

    ["mysql"] interval 20 timeout 10 handlers ["slack"] output_metric_format "nagios_perfdata" output_metric_handlers ["influxdb"] end
  72. COMPLEXITY TIME

  73. None
  74. +

  75. Thank You Co-Founder & CTO, Sensu Inc. Sean Porter (@portertech)

    ChefConf 2018