Unified Observability of Distributed Systems

Unified Observability of Distributed Systems

“If a microservice falls down in the middle of a server farm, does my pager make a sound?”

If your service is automatically monitored, then the answer is “yes!”. But now that you’ve been paged and roused from your slumber… what happens next? Do you stumble to your computer, bleary-eyed, trying to find the elusive problem by cross-referencing dashboards and server logs across eleven different browser tabs? Or do you have better tools that you can use to integrate your monitoring data?

Unless you’re an engineer at a massive company like Google, the answer is probably “no” - most companies don’t have the resources to build all their own monitoring tools in-house. When using third-party and open-source tools for monitoring, there will always be gaps in between.

Fortunately, there’s a way teams can get the best of both worlds: high-resolution visibility into your systems, but without having to write your entire monitoring stack yourselves. We built a custom-built, open-source distributed tracing and monitoring pipeline that allows us to inspect each step of an HTTP request and diagnose the root causes of errors, leveraging the same open-source and third-party monitoring platforms you’re already used to. And with a monitoring pipeline that unifies metrics, logs, and traces, you can live the observability dream: the right data, in the right form, right when you need it.

94dcff33cbdf74b5d785369ac54bc1a8?s=128

Aditya Mukerjee

May 10, 2018
Tweet

Transcript

  1. Unified Observability of Distributed Systems Aditya Mukerjee Systems Engineer at

    Stripe @chimeracoder
  2. @chimeracoder

  3. Why are we here? @chimeracoder

  4. It’s 3:07 AM @chimeracoder

  5. Dashboard Count: 1 @chimeracoder

  6. Dashboard Count: 2 @chimeracoder

  7. Dashboard Count: 3 @chimeracoder

  8. @chimeracoder

  9. Dashboard Count: 4 @chimeracoder

  10. @chimeracoder

  11. @chimeracoder

  12. What tools can we use? Metrics/dashboards? Logs? Request traces? No

    context! Hard to aggregate! Require planning! @chimeracoder
  13. Monitoring information is only as good as developers’ ability to

    predict the future @chimeracoder
  14. @chimeracoder

  15. @chimeracoder

  16. @chimeracoder

  17. @chimeracoder Application

  18. What’s the difference? •If you squint, it’s hard to tell

    them apart •A log is a metric with “longer” information •A trace is a metric that allows “inner joins” @chimeracoder
  19. What if we could have all three, all the time?

    @chimeracoder
  20. Standard Sensor Format @chimeracoder

  21. @chimeracoder

  22. @chimeracoder

  23. @chimeracoder Application

  24. Integrated Views @chimeracoder

  25. @chimeracoder

  26. @chimeracoder

  27. @chimeracoder

  28. Flexibility and Data Migrations @chimeracoder

  29. @chimeracoder Application B A C

  30. “Because we’re in control of our pipeline, we could add

    a new data backend or migrate vendors without having to touch our application code at all.” “Having ownership over our pipeline gave us trust in our data. It made us confident that we we hadn’t overlooked any parts of the migration process.” @chimeracoder
  31. Tradeoffs: Stacking the Deck @chimeracoder

  32. Distributed Collection @chimeracoder host1 host2 host3 Dashboard Tool

  33. Aggregation @chimeracoder host1 host2 host3 Global Aggregator Dashboard Tool

  34. Distributed Aggregation @chimeracoder host1 host2 host3 Dashboard Tool

  35. Stacking the Deck Histogram: t-digests @chimeracoder

  36. Trying out Veneur •Free and open source! http://github.com/stripe/veneur •Six-week release

    cycle • Drop-in support for statsd, Graphite, Datadog, SignalFx, Prometheus, and more •Native Kubernetes support •Public images on Docker Hub @chimeracoder
  37. @chimeracoder

  38. @chimeracoder

  39. @chimeracoder

  40. Veneur in 2017 • High availability • Host-local metrics •

    Global aggregate metrics • Sketching data structures • … and more! Veneur in 2018 • Automatic cardinality detection • Expanded cross-dashboard integration • Unified client instrumentation • … help us decide the rest! @chimeracoder
  41. Let’s build the world we want to see @chimeracoder

  42. Thank you! https://github.com/stripe/veneur #veneur on Freenode Aditya Mukerjee @chimeracoder @chimeracoder

  43. References @chimeracoder