1
Why you need one, and why you should
stop rolling your own.
By Sean Porter (@PorterTech)
Monitoring Event
Pipelines
Slide 2
Slide 2 text
● Sean Porter
● Creator of Sensu
● Co-founder & CTO
● @PorterTech
Who am I?
2
Slide 3
Slide 3 text
Overview
3
● Our shared reality
● My experience in building Sensu
● What is a monitoring pipeline?
● Attributes of an effective pipeline
● Demo time
● The future of pipelines
Slide 4
Slide 4 text
4
OUR SHARED
REALITY
Slide 5
Slide 5 text
5
COMPLEXITY
TIME
Slide 6
Slide 6 text
6
# OF THINGS
TIME
Containers
Servers
VMs
Functions
“We gotta find a way to make [this] fit into
the hole for (this), using nothing but {that}.”
8
Slide 9
Slide 9 text
9
AMOUNT OF DATA
TIME
Slide 10
Slide 10 text
10
Slide 11
Slide 11 text
11
THE TOOLS YOU KNOW
Slide 12
Slide 12 text
12
INTEGRATE
NEED TO BUILD
SCALE
Slide 13
Slide 13 text
13
Slide 14
Slide 14 text
14
NEED TO MAINTAIN
Slide 15
Slide 15 text
15
HOLD THAT THOUGHT
Slide 16
Slide 16 text
16
BUILDING
SENSU
Slide 17
Slide 17 text
Sensu origin story
17
I joined Sonian in 2010 as an “Automation Engineer”
● Early adopters
● High rate of change
○ Growing team
○ Evolving software stack
Slide 18
Slide 18 text
18
Slide 19
Slide 19 text
Sensu origin story
19
I started a project with some goals (July 2011)
● Handle ephemeral compute
● Leverage existing and familiar technologies
● Easy to drive with config management
● Easy to scale horizontally
● APIs!!!
Slide 20
Slide 20 text
Sensu origin story
20
● Agent based system with auto-discovery
● Message bus for communication
● Simple key-value data store for state
● Central service check scheduler (pub-sub)
● JSON configuration
● REST APIs
Slide 21
Slide 21 text
21
Slide 22
Slide 22 text
22
Slide 23
Slide 23 text
Sensu origin story
23
● Designed for the cloud
○ Proved to handle ephemeral compute
○ Operated securely on public net (AWS)
● Focused on composability & extensibility
○ Reusable components / building blocks
Slide 24
Slide 24 text
Sensu origin story
24
Slide 25
Slide 25 text
Sensu origin story
25
● Named it Sensu
● Sonian sponsored development!
○ Deployed to production after 2 months
○ Replaced a number of tools!
● Open sourced (MIT) on November 1st, 2011
Slide 26
Slide 26 text
26
Slide 27
Slide 27 text
27
FRAMEWORK
THE MONITORING ROUTER
BUS
Slide 28
Slide 28 text
28
THE MONITORING PIPELINE
Slide 29
Slide 29 text
29
THE MONITORING PIPELINE
OBSERVABILITY
Slide 30
Slide 30 text
30
WHAT IS A
MONITORING PIPELINE?
Slide 31
Slide 31 text
Unified data collection and processing for all
types of monitoring events:
● Service checks
● Metrics
● Traces
● Logs
● Inventory
What is a monitoring pipeline?
31
Slide 32
Slide 32 text
There are two critical layers:
● Data plane
● Control plane
What is a monitoring pipeline?
32
Slide 33
Slide 33 text
● Data input
● Data transportation & routing
● Load balancing & failover
● The layer developers interact with (APIs)
The data plane
33
Slide 34
Slide 34 text
● Central management unit
○ Orchestrator
○ Configuration
○ Security (auth)
● APIs, agents, data processors, etc.
● The layer operators interact with
The control plane
34
● Fewer agents
○ Less “edge” service to support
○ Resource utilization (i.e. fewer sidecars)
● Cost savings*
Wait… But why?
47
Slide 48
Slide 48 text
48
ATTRIBUTES OF AN
EFFECTIVE PIPELINE
Slide 49
Slide 49 text
● Unified data format(s)
● Unique IDs
● Capture context at collection time
● Support additional metadata
● Support efficient debugging
Event payload(s)
49
● Lightweight (think sidecars)
● Multi-platform support
● Initiates connections to backends
● Bi-directional communication
Collection Agent
54
Slide 55
Slide 55 text
● Auto-registration with the backend
● Auto-discovery (context)
○ Platform info
○ System details
○ Roles / responsibilities
Collection Agent
55
Slide 56
Slide 56 text
● Keepalive / heartbeat mechanism
● Service check execution support
○ Commonly overlooked / underappreciated
○ Leverage over a decade of investment
● Durable outbound data queue
Collection Agent
56
Slide 57
Slide 57 text
● Several data inputs (APIs)
○ Industry standards*
■ Metrics (StatsD, OpenTelemetry, Graphite)
■ Trace (OpenTracing)
■ Structured log (JSON)
Collection Agent
57
Slide 58
Slide 58 text
● Standard cryptography (TLS v1.2)
○ mTLS verification & authentication
● Standard protocol (HTTP)
● Agent initiated connections
○ Traverse complex networks
Data Transport
58
Slide 59
Slide 59 text
● Scale horizontally
● Little coordination with peers
● Concurrency & parallelism
○ “Concurrency is about dealing with lots
of things at once. Parallelism is about
doing lots of things at once.” — Rob Pike
Data Processor
59
Slide 60
Slide 60 text
● Easy to extend and integrate!
○ Simple APIs and clear specs
● Multi-tenancy to enable self-service
○ Namespaces
○ RBAC
Data Processor
60
Slide 61
Slide 61 text
● The “secret sauce”
● Granular routing
○ Is it an incident?
○ Is it a resolution?
○ Metric data?
○ Production? Office hours?
Filtering
61
64
Monitoring Pipeline
Filter
Only incidents
Filter
Only logs
Filter
Only production
Transform
Transform
-
-
Action
Action
Action
Action
Action
Action
Events
Service Checks
Metrics
Trace
Log
Transform
Redact Sensitive
Transform
Nagios => InfluxDB
Filter
Only once every 30 min
Filter
Only metrics
Filter
Only incidents
Transform
Annotate
Transform
-