Unifying Your Observability Pipeline

Unifying Your Observability Pipeline

“If a microservice falls down in the middle of a server farm, does my pager make a sound?”

If your service is automatically monitored, then the answer is “yes!”. But now that you’ve been paged and roused from your slumber… what happens next? Do you stumble to your computer, bleary-eyed, trying to find the elusive problem by cross-referencing dashboards and server logs across eleven different browser tabs? Or do you have better tools that you can use to integrate your monitoring data?

Unless you’re an engineer at a massive company like Google, the answer is probably “no” - most companies don’t have the resources to build all their own monitoring tools in-house. When using third-party and open-source tools for monitoring, there will always be gaps in between.

Fortunately, there’s a way teams can get the best of both worlds: high-resolution visibility into your systems, but without having to write your entire monitoring stack yourselves. We built a custom-built, open-source distributed tracing and monitoring pipeline that allows us to inspect each step of an HTTP request and diagnose the root causes of errors, leveraging the same open-source and third-party monitoring platforms you’re already used to. And with a monitoring pipeline that unifies metrics, logs, and traces, you can live the observability dream: the right data, in the right form, right when you need it.

94dcff33cbdf74b5d785369ac54bc1a8?s=128

Aditya Mukerjee

July 13, 2018
Tweet

Transcript

  1. Unifying Your Observability Pipeline Aditya Mukerjee Systems Engineer at Stripe

    @chimeracoder
  2. @chimeracoder

  3. Why are we here? @chimeracoder

  4. It’s 3:07 AM @chimeracoder

  5. Dashboard Count: 1 @chimeracoder

  6. Dashboard Count: 2 @chimeracoder

  7. Dashboard Count: 3 @chimeracoder

  8. @chimeracoder

  9. Dashboard Count: 4 @chimeracoder

  10. @chimeracoder

  11. @chimeracoder Dashboard Count: 5

  12. What tools can we use? Metrics/dashboards? Logs? Request traces? No

    context! Hard to aggregate! Require planning! @chimeracoder
  13. Monitoring information is only as good as developers’ ability to

    predict the future @chimeracoder
  14. @chimeracoder

  15. @chimeracoder

  16. @chimeracoder

  17. @chimeracoder Application

  18. What’s the difference? •If you squint, it’s hard to tell

    them apart •A log is a metric with “longer” information •A trace is a metric that allows “inner joins” @chimeracoder
  19. What if we could have all three, all the time?

    @chimeracoder
  20. Standard Sensor Format @chimeracoder

  21. @chimeracoder

  22. @chimeracoder

  23. @chimeracoder Application

  24. Integrated Views @chimeracoder

  25. @chimeracoder

  26. @chimeracoder

  27. @chimeracoder

  28. Tradeoffs: Stacking the Deck @chimeracoder

  29. Distributed Collection @chimeracoder host1 host2 host3 Dashboard Tool

  30. Aggregation @chimeracoder host1 host2 host3 Global Aggregator Dashboard Tool

  31. Distributed Aggregation @chimeracoder host1 host2 host3 Dashboard Tool

  32. Stacking the Deck Histogram: t-digests @chimeracoder

  33. @chimeracoder

  34. @chimeracoder

  35. @chimeracoder

  36. Trying out Veneur •Free and open source! http://github.com/stripe/veneur •Six-week release

    cycle • Drop-in support for statsd, Graphite, Datadog, SignalFx, Prometheus, and more •Native Kubernetes support •Public images on Docker Hub @chimeracoder
  37. Let’s build the world we want to see @chimeracoder

  38. Thank you! https://github.com/stripe/veneur #veneur on Freenode Aditya Mukerjee @chimeracoder @chimeracoder

  39. References @chimeracoder