Monitoring Event Pipelines

98f9dfc2e5e1318ac78b8c716582cd30?s=47 portertech
November 06, 2019

Monitoring Event Pipelines

Why you need one, and why you should stop rolling your own.

98f9dfc2e5e1318ac78b8c716582cd30?s=128

portertech

November 06, 2019
Tweet

Transcript

  1. 1 Why you need one, and why you should stop

    rolling your own. By Sean Porter (@PorterTech) Monitoring Event Pipelines
  2. • Sean Porter • Creator of Sensu • Co-founder &

    CTO • @PorterTech Who am I? 2
  3. Overview 3 • Our shared reality • My experience in

    building Sensu • What is a monitoring pipeline? • Attributes of an effective pipeline • Demo time • The future of pipelines
  4. 4 OUR SHARED REALITY

  5. 5 COMPLEXITY TIME

  6. 6 # OF THINGS TIME Containers Servers VMs Functions

  7. Ephemeral compute! • Host-based -> role-based monitoring • Polling ->

    publish-subscribe & push APIs • Point-and-click -> Infrastructure as Code The paradigm shift 7
  8. “We gotta find a way to make [this] fit into

    the hole for (this), using nothing but {that}.” 8
  9. 9 AMOUNT OF DATA TIME

  10. 10

  11. 11 THE TOOLS YOU KNOW

  12. 12 INTEGRATE NEED TO BUILD SCALE

  13. 13

  14. 14 NEED TO MAINTAIN

  15. 15 HOLD THAT THOUGHT

  16. 16 BUILDING SENSU

  17. Sensu origin story 17 I joined Sonian in 2010 as

    an “Automation Engineer” • Early adopters • High rate of change ◦ Growing team ◦ Evolving software stack
  18. 18

  19. Sensu origin story 19 I started a project with some

    goals (July 2011) • Handle ephemeral compute • Leverage existing and familiar technologies • Easy to drive with config management • Easy to scale horizontally • APIs!!!
  20. Sensu origin story 20 • Agent based system with auto-discovery

    • Message bus for communication • Simple key-value data store for state • Central service check scheduler (pub-sub) • JSON configuration • REST APIs
  21. 21

  22. 22

  23. Sensu origin story 23 • Designed for the cloud ◦

    Proved to handle ephemeral compute ◦ Operated securely on public net (AWS) • Focused on composability & extensibility ◦ Reusable components / building blocks
  24. Sensu origin story 24

  25. Sensu origin story 25 • Named it Sensu • Sonian

    sponsored development! ◦ Deployed to production after 2 months ◦ Replaced a number of tools! • Open sourced (MIT) on November 1st, 2011
  26. 26

  27. 27 FRAMEWORK THE MONITORING ROUTER BUS

  28. 28 THE MONITORING PIPELINE

  29. 29 THE MONITORING PIPELINE OBSERVABILITY

  30. 30 WHAT IS A MONITORING PIPELINE?

  31. Unified data collection and processing for all types of monitoring

    events: • Service checks • Metrics • Traces • Logs • Inventory What is a monitoring pipeline? 31
  32. There are two critical layers: • Data plane • Control

    plane What is a monitoring pipeline? 32
  33. • Data input • Data transportation & routing • Load

    balancing & failover • The layer developers interact with (APIs) The data plane 33
  34. • Central management unit ◦ Orchestrator ◦ Configuration ◦ Security

    (auth) • APIs, agents, data processors, etc. • The layer operators interact with The control plane 34
  35. 35 https://www.youtube.com/watch?v=CM2Y6B1yuDg

  36. 36

  37. 37 https://bravenewgeek.com/the-observability-pipeline/

  38. 38 WAIT… BUT WHY?

  39. 39 CHANGE YOUR MIND

  40. 40 CHANGE DATASTORES

  41. 41 CHANGE FORMATS

  42. 42 CHANGE VISUALIZATION

  43. 43 CHANGE SAMPLING

  44. 44 CHANGE PLATFORMS

  45. 45 MAKE CHANGE INEXPENSIVE

  46. 46 MAKE IT FUTURE PROOF

  47. • Fewer agents ◦ Less “edge” service to support ◦

    Resource utilization (i.e. fewer sidecars) • Cost savings* Wait… But why? 47
  48. 48 ATTRIBUTES OF AN EFFECTIVE PIPELINE

  49. • Unified data format(s) • Unique IDs • Capture context

    at collection time • Support additional metadata • Support efficient debugging Event payload(s) 49
  50. Event payload(s) 50

  51. { metadata: { annotations: { … } }, entity: {

    name: “osmc.de” }, timestamp: now() } Event payload(s) 51
  52. { metadata: { annotations: { … } }, check: {

    output: “the system is down” }, timestamp: now() } Event payload(s) 52
  53. { metadata: { annotations: { … } }, metrics: {

    points: [ … ] }, timestamp: now() } Event payload(s) 53
  54. • Lightweight (think sidecars) • Multi-platform support • Initiates connections

    to backends • Bi-directional communication Collection Agent 54
  55. • Auto-registration with the backend • Auto-discovery (context) ◦ Platform

    info ◦ System details ◦ Roles / responsibilities Collection Agent 55
  56. • Keepalive / heartbeat mechanism • Service check execution support

    ◦ Commonly overlooked / underappreciated ◦ Leverage over a decade of investment • Durable outbound data queue Collection Agent 56
  57. • Several data inputs (APIs) ◦ Industry standards* ▪ Metrics

    (StatsD, OpenTelemetry, Graphite) ▪ Trace (OpenTracing) ▪ Structured log (JSON) Collection Agent 57
  58. • Standard cryptography (TLS v1.2) ◦ mTLS verification & authentication

    • Standard protocol (HTTP) • Agent initiated connections ◦ Traverse complex networks Data Transport 58
  59. • Scale horizontally • Little coordination with peers • Concurrency

    & parallelism ◦ “Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.” — Rob Pike Data Processor 59
  60. • Easy to extend and integrate! ◦ Simple APIs and

    clear specs • Multi-tenancy to enable self-service ◦ Namespaces ◦ RBAC Data Processor 60
  61. • The “secret sauce” • Granular routing ◦ Is it

    an incident? ◦ Is it a resolution? ◦ Metric data? ◦ Production? Office hours? Filtering 61
  62. Transformation 62

  63. Actions 63 • Alert notifications • Incident management • Metric

    & event storage • Inventory • Auto-remediation
  64. 64 Monitoring Pipeline Filter Only incidents Filter Only logs Filter

    Only production Transform Transform - - Action Action Action Action Action Action Events Service Checks Metrics Trace Log Transform Redact Sensitive Transform Nagios => InfluxDB Filter Only once every 30 min Filter Only metrics Filter Only incidents Transform Annotate Transform -
  65. 65 https://bravenewgeek.com/the-observability-pipeline/

  66. 66 DEMO TIME

  67. 67 THE FUTURE OF PIPELINES

  68. • Sean Porter • @PorterTech • https://sensu.io Thank you! 68