Data Collection & Prometheus Scraping with Sensu 2.0
by
https://speakerdeck.com/portertech
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Data Collection & Prometheus Scraping with Sensu 2.0 Co-Founder & CTO, Sensu Inc. Sean Porter InfluxDays 2018
Slide 2
Slide 2 text
● Sean Porter ● Author of Sensu ● CTO for Sensu Inc. ● @portertech
Slide 3
Slide 3 text
We solve our problems with technology.
Slide 4
Slide 4 text
We create new problems with technology.
Slide 5
Slide 5 text
Illustration by Fredrik Skarstedt
Slide 6
Slide 6 text
HOST APP APP APP APP
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
HOST VM VM APP APP APP APP
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
HOST VM VM APP APP APP APP
Slide 11
Slide 11 text
COMPLEXITY TIME
Slide 12
Slide 12 text
Apps can span any number of technologies.
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
No content
Slide 15
Slide 15 text
+
Slide 16
Slide 16 text
Design
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
No content
Slide 19
Slide 19 text
No content
Slide 20
Slide 20 text
No content
Slide 21
Slide 21 text
No content
Slide 22
Slide 22 text
No content
Slide 23
Slide 23 text
No content
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
No content
Slide 26
Slide 26 text
No content
Slide 27
Slide 27 text
{ timestamp: 1516663186, entity: { … }, check: { … }, metrics: { ... } }
Slide 28
Slide 28 text
No content
Slide 29
Slide 29 text
No content
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
● Backend REST API ● sensuctl (CLI tool) ● Dashboard Configuration
Slide 33
Slide 33 text
● RBAC ● Organization ● Environment Configuration
Slide 34
Slide 34 text
No content
Slide 35
Slide 35 text
3 Methods The three methods of data collection with Sensu.
Slide 36
Slide 36 text
1. Service Checks
Slide 37
Slide 37 text
● Script ● STDOUT (message and data) ● Exit code (severity) Service Checks
Slide 38
Slide 38 text
check_mysql -H localhost -P 3360 Uptime: 798 Threads: 1 Questions: 5 Slow queries: 0 Opens: 107 Flush tables: 1 Open tables: 26 Queries per second avg: 0.006|Connections=9c;;; Open_files=6;;; Open_tables=27;;; Qcache_free_memory=16760152;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=1c;;; Qcache_queries_in_cache=0;;; Queries=6c;;; Questions=4c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=798c;;; Exit 0 (OK) Service Checks
Slide 39
Slide 39 text
check_mysql -H localhost -P 3360 Can't connect to MySQL server on 'localhost' Exit 2 (CRITICAL) Service Checks
Slide 40
Slide 40 text
{ timestamp: 1516663186, entity: { … }, check: { command: “check_mysql -H ...” output: “Can’t connect ... ”, status: 2, … }, metrics: { ... } }
Slide 41
Slide 41 text
Symptoms
Slide 42
Slide 42 text
No content
Slide 43
Slide 43 text
check_mysql -H localhost -P 3360 Uptime: 798 Threads: 1 Questions: 5 Slow queries: 0 Opens: 107 Flush tables: 1 Open tables: 26 Queries per second avg: 0.006|Connections=9c;;; Open_files=6;;; Open_tables=27;;; Qcache_free_memory=16760152;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=1c;;; Qcache_queries_in_cache=0;;; Queries=6c;;; Questions=4c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=798c;;; Exit 0 (OK) Service Checks
Slide 44
Slide 44 text
No content
Slide 45
Slide 45 text
No content
Slide 46
Slide 46 text
● Simple ● Accessible ● Shareable ● Legacy Service Checks
Slide 47
Slide 47 text
2. Events API
Slide 48
Slide 48 text
● REST API (Agent & Backend) ● Entity management ● External checks ● Metrics Events API
Slide 49
Slide 49 text
POST /events { timestamp: 1516663186, entity: { … }, check: { … }, metrics: { ... } }
Slide 50
Slide 50 text
{ timestamp: 1516663186, entity: { name: leviathan, class: application, tags: [ … ], ... }, check: { … }, metrics: { ... } }
Slide 51
Slide 51 text
{ timestamp: 1516663186, entity: { … }, check: { output: “Backup failed ... ”, status: 2, ttl: 6h, … }, metrics: { ... } }
Slide 52
Slide 52 text
{ timestamp: 1516663186, entity: { … }, check: { … }, metrics: { handlers: [influxdb], points: [{ name: mysql.connections, value: 9, tags: [ … ] }] } }
Slide 53
Slide 53 text
3. StatsD
Slide 54
Slide 54 text
● Agent listeners (TCP & UDP) ● Stats aggregation ● Gauges, counters, etc. ● Protocol enhancements (tags) StatsD
Slide 55
Slide 55 text
:|c[|@]
Slide 56
Slide 56 text
● Service checks ● Events API ● StatsD 3 Methods Recap
Slide 57
Slide 57 text
Prom Scraping
Slide 58
Slide 58 text
/metrics /metrics /metrics /metrics
Slide 59
Slide 59 text
Demo
Slide 60
Slide 60 text
No content
Slide 61
Slide 61 text
COMPLEXITY TIME
Slide 62
Slide 62 text
No content
Slide 63
Slide 63 text
No content
Slide 64
Slide 64 text
+
Slide 65
Slide 65 text
Thank You Co-Founder & CTO, Sensu Inc. Sean Porter (@portertech) InfluxDays 2018