Data Collection & Prometheus Scraping with Sensu 2.0

98f9dfc2e5e1318ac78b8c716582cd30?s=47 portertech
February 13, 2018

Data Collection & Prometheus Scraping with Sensu 2.0

Applications are complex systems. Their many moving parts, component and dependency services, may span any number of infrastructure technologies and platforms, from bare metal to serverless. As the number of services increases, teams responsible for them will naturally develop their own preferences, such as how they instrument their code or how and when they receive alerts.

Sean will demonstrate how Sensu 2.0 is designed to collect monitoring and telemetry data from these heterogeneous environments and store them in InfluxDB. Sensu 2.0 is the next release of the open source monitoring framework, rewritten in Go, with new capabilities and reduced operational overhead. Using Sensu alongside InfluxDB, Sean will go over various patterns of data collection, including scraping Prometheus metrics, and show how Sensu enables self-service data collection for service owners.

98f9dfc2e5e1318ac78b8c716582cd30?s=128

portertech

February 13, 2018
Tweet

Transcript

  1. Data Collection & Prometheus Scraping with Sensu 2.0 Co-Founder &

    CTO, Sensu Inc. Sean Porter InfluxDays 2018
  2. • Sean Porter • Author of Sensu • CTO for

    Sensu Inc. • @portertech
  3. We solve our problems with technology.

  4. We create new problems with technology.

  5. Illustration by Fredrik Skarstedt

  6. HOST APP APP APP APP

  7. None
  8. HOST VM VM APP APP APP APP

  9. None
  10. HOST VM VM APP APP APP APP

  11. COMPLEXITY TIME

  12. Apps can span any number of technologies.

  13. None
  14. None
  15. +

  16. Design

  17. None
  18. None
  19. None
  20. None
  21. None
  22. None
  23. None
  24. None
  25. None
  26. None
  27. { timestamp: 1516663186, entity: { … }, check: { …

    }, metrics: { ... } }
  28. None
  29. None
  30. None
  31. None
  32. • Backend REST API • sensuctl (CLI tool) • Dashboard

    Configuration
  33. • RBAC • Organization • Environment Configuration

  34. None
  35. 3 Methods The three methods of data collection with Sensu.

  36. 1. Service Checks

  37. • Script • STDOUT (message and data) • Exit code

    (severity) Service Checks
  38. check_mysql -H localhost -P 3360 Uptime: 798 Threads: 1 Questions:

    5 Slow queries: 0 Opens: 107 Flush tables: 1 Open tables: 26 Queries per second avg: 0.006|Connections=9c;;; Open_files=6;;; Open_tables=27;;; Qcache_free_memory=16760152;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=1c;;; Qcache_queries_in_cache=0;;; Queries=6c;;; Questions=4c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=798c;;; Exit 0 (OK) Service Checks
  39. check_mysql -H localhost -P 3360 Can't connect to MySQL server

    on 'localhost' Exit 2 (CRITICAL) Service Checks
  40. { timestamp: 1516663186, entity: { … }, check: { command:

    “check_mysql -H ...” output: “Can’t connect ... ”, status: 2, … }, metrics: { ... } }
  41. Symptoms

  42. None
  43. check_mysql -H localhost -P 3360 Uptime: 798 Threads: 1 Questions:

    5 Slow queries: 0 Opens: 107 Flush tables: 1 Open tables: 26 Queries per second avg: 0.006|Connections=9c;;; Open_files=6;;; Open_tables=27;;; Qcache_free_memory=16760152;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=1c;;; Qcache_queries_in_cache=0;;; Queries=6c;;; Questions=4c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=798c;;; Exit 0 (OK) Service Checks
  44. None
  45. None
  46. • Simple • Accessible • Shareable • Legacy Service Checks

  47. 2. Events API

  48. • REST API (Agent & Backend) • Entity management •

    External checks • Metrics Events API
  49. POST /events { timestamp: 1516663186, entity: { … }, check:

    { … }, metrics: { ... } }
  50. { timestamp: 1516663186, entity: { name: leviathan, class: application, tags:

    [ … ], ... }, check: { … }, metrics: { ... } }
  51. { timestamp: 1516663186, entity: { … }, check: { output:

    “Backup failed ... ”, status: 2, ttl: 6h, … }, metrics: { ... } }
  52. { timestamp: 1516663186, entity: { … }, check: { …

    }, metrics: { handlers: [influxdb], points: [{ name: mysql.connections, value: 9, tags: [ … ] }] } }
  53. 3. StatsD

  54. • Agent listeners (TCP & UDP) • Stats aggregation •

    Gauges, counters, etc. • Protocol enhancements (tags) StatsD
  55. <name>:<value>|c[|@<sample rate>]

  56. • Service checks • Events API • StatsD 3 Methods

    Recap
  57. Prom Scraping

  58. /metrics /metrics /metrics /metrics

  59. Demo

  60. None
  61. COMPLEXITY TIME

  62. None
  63. None
  64. +

  65. Thank You Co-Founder & CTO, Sensu Inc. Sean Porter (@portertech)

    InfluxDays 2018