Monitoring Event Pipelines

1 Why you need one, and why you should stop
rolling your own. By Sean Porter (@PorterTech) Monitoring Event Pipelines

• Sean Porter • Creator of Sensu • Co-founder &
CTO • @PorterTech Who am I? 2

Overview 3 • Our shared reality • My experience in
building Sensu • What is a monitoring pipeline? • Attributes of an effective pipeline • Demo time • The future of pipelines

4 OUR SHARED REALITY

5 COMPLEXITY TIME

6 # OF THINGS TIME Containers Servers VMs Functions

Ephemeral compute! • Host-based -> role-based monitoring • Polling ->
publish-subscribe & push APIs • Point-and-click -> Infrastructure as Code The paradigm shift 7

“We gotta ﬁnd a way to make [this] ﬁt into
the hole for (this), using nothing but {that}.” 8

9 AMOUNT OF DATA TIME

11 THE TOOLS YOU KNOW

12 INTEGRATE NEED TO BUILD SCALE

14 NEED TO MAINTAIN

15 HOLD THAT THOUGHT

16 BUILDING SENSU

Sensu origin story 17 I joined Sonian in 2010 as
an “Automation Engineer” • Early adopters • High rate of change ◦ Growing team ◦ Evolving software stack

Sensu origin story 19 I started a project with some
goals (July 2011) • Handle ephemeral compute • Leverage existing and familiar technologies • Easy to drive with conﬁg management • Easy to scale horizontally • APIs!!!

Sensu origin story 20 • Agent based system with auto-discovery
• Message bus for communication • Simple key-value data store for state • Central service check scheduler (pub-sub) • JSON conﬁguration • REST APIs

Sensu origin story 23 • Designed for the cloud ◦
Proved to handle ephemeral compute ◦ Operated securely on public net (AWS) • Focused on composability & extensibility ◦ Reusable components / building blocks

Sensu origin story 24

Sensu origin story 25 • Named it Sensu • Sonian
sponsored development! ◦ Deployed to production after 2 months ◦ Replaced a number of tools! • Open sourced (MIT) on November 1st, 2011

27 FRAMEWORK THE MONITORING ROUTER BUS

28 THE MONITORING PIPELINE

29 THE MONITORING PIPELINE OBSERVABILITY

30 WHAT IS A MONITORING PIPELINE?

Uniﬁed data collection and processing for all types of monitoring
events: • Service checks • Metrics • Traces • Logs • Inventory What is a monitoring pipeline? 31

There are two critical layers: • Data plane • Control
plane What is a monitoring pipeline? 32

• Data input • Data transportation & routing • Load
balancing & failover • The layer developers interact with (APIs) The data plane 33

• Central management unit ◦ Orchestrator ◦ Conﬁguration ◦ Security
(auth) • APIs, agents, data processors, etc. • The layer operators interact with The control plane 34

35 https://www.youtube.com/watch?v=CM2Y6B1yuDg

37 https://bravenewgeek.com/the-observability-pipeline/

38 WAIT… BUT WHY?

39 CHANGE YOUR MIND

40 CHANGE DATASTORES

41 CHANGE FORMATS

42 CHANGE VISUALIZATION

43 CHANGE SAMPLING

44 CHANGE PLATFORMS

45 MAKE CHANGE INEXPENSIVE

46 MAKE IT FUTURE PROOF

• Fewer agents ◦ Less “edge” service to support ◦
Resource utilization (i.e. fewer sidecars) • Cost savings* Wait… But why? 47

48 ATTRIBUTES OF AN EFFECTIVE PIPELINE

• Uniﬁed data format(s) • Unique IDs • Capture context
at collection time • Support additional metadata • Support efﬁcient debugging Event payload(s) 49

Event payload(s) 50

{ metadata: { annotations: { … } }, entity: {
name: “osmc.de” }, timestamp: now() } Event payload(s) 51

{ metadata: { annotations: { … } }, check: {
output: “the system is down” }, timestamp: now() } Event payload(s) 52

{ metadata: { annotations: { … } }, metrics: {
points: [ … ] }, timestamp: now() } Event payload(s) 53

• Lightweight (think sidecars) • Multi-platform support • Initiates connections
to backends • Bi-directional communication Collection Agent 54

• Auto-registration with the backend • Auto-discovery (context) ◦ Platform
info ◦ System details ◦ Roles / responsibilities Collection Agent 55

• Keepalive / heartbeat mechanism • Service check execution support
◦ Commonly overlooked / underappreciated ◦ Leverage over a decade of investment • Durable outbound data queue Collection Agent 56

• Several data inputs (APIs) ◦ Industry standards* ▪ Metrics
(StatsD, OpenTelemetry, Graphite) ▪ Trace (OpenTracing) ▪ Structured log (JSON) Collection Agent 57

• Standard cryptography (TLS v1.2) ◦ mTLS veriﬁcation & authentication
• Standard protocol (HTTP) • Agent initiated connections ◦ Traverse complex networks Data Transport 58

• Scale horizontally • Little coordination with peers • Concurrency
& parallelism ◦ “Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.” — Rob Pike Data Processor 59

• Easy to extend and integrate! ◦ Simple APIs and
clear specs • Multi-tenancy to enable self-service ◦ Namespaces ◦ RBAC Data Processor 60

• The “secret sauce” • Granular routing ◦ Is it
an incident? ◦ Is it a resolution? ◦ Metric data? ◦ Production? Ofﬁce hours? Filtering 61

Transformation 62

Actions 63 • Alert notiﬁcations • Incident management • Metric
& event storage • Inventory • Auto-remediation

64 Monitoring Pipeline Filter Only incidents Filter Only logs Filter
Only production Transform Transform - - Action Action Action Action Action Action Events Service Checks Metrics Trace Log Transform Redact Sensitive Transform Nagios => InfluxDB Filter Only once every 30 min Filter Only metrics Filter Only incidents Transform Annotate Transform -

65 https://bravenewgeek.com/the-observability-pipeline/

66 DEMO TIME

67 THE FUTURE OF PIPELINES

• Sean Porter • @PorterTech • https://sensu.io Thank you! 68

Monitoring Event Pipelines

Monitoring Event Pipelines

More Decks by portertech

Other Decks in Programming

Featured

Transcript