Unified Observability of Distributed Systems

Unified Observability of Distributed Systems

“If a microservice falls down in the middle of a server farm, does my pager make a sound?”

If your service is automatically monitored, then the answer is “yes!”. But now that you’ve been paged and roused from your slumber… what happens next? Do you stumble to your computer, bleary-eyed, trying to find the elusive problem by cross-referencing dashboards and server logs across eleven different browser tabs? Or do you have better tools that you can use to integrate your monitoring data?

Unless you’re an engineer at a massive company like Google, the answer is probably “no” - most companies don’t have the resources to build all their own monitoring tools in-house. When using third-party and open-source tools for monitoring, there will always be gaps in between.

Fortunately, there’s a way teams can get the best of both worlds: high-resolution visibility into your systems, but without having to write your entire monitoring stack yourselves. We built a custom-built, open-source distributed tracing and monitoring pipeline that allows us to inspect each step of an HTTP request and diagnose the root causes of errors, leveraging the same open-source and third-party monitoring platforms you’re already used to. And with a monitoring pipeline that unifies metrics, logs, and traces, you can live the observability dream: the right data, in the right form, right when you need it.

94dcff33cbdf74b5d785369ac54bc1a8?s=128

Aditya Mukerjee

May 10, 2018
Tweet

Transcript

  1. 12.

    What tools can we use? Metrics/dashboards? Logs? Request traces? No

    context! Hard to aggregate! Require planning! @chimeracoder
  2. 18.

    What’s the difference? •If you squint, it’s hard to tell

    them apart •A log is a metric with “longer” information •A trace is a metric that allows “inner joins” @chimeracoder
  3. 30.

    “Because we’re in control of our pipeline, we could add

    a new data backend or migrate vendors without having to touch our application code at all.” “Having ownership over our pipeline gave us trust in our data. It made us confident that we we hadn’t overlooked any parts of the migration process.” @chimeracoder
  4. 36.

    Trying out Veneur •Free and open source! http://github.com/stripe/veneur •Six-week release

    cycle • Drop-in support for statsd, Graphite, Datadog, SignalFx, Prometheus, and more •Native Kubernetes support •Public images on Docker Hub @chimeracoder
  5. 40.

    Veneur in 2017 • High availability • Host-local metrics •

    Global aggregate metrics • Sketching data structures • … and more! Veneur in 2018 • Automatic cardinality detection • Expanded cross-dashboard integration • Unified client instrumentation • … help us decide the rest! @chimeracoder