Tracing Production Services at Stripe

Tracing Production Services at Stripe

If a microservice falls down in the middle of a server farm, does my pager make a sound?

If your service is automatically monitored, then the answer is “yes!”. But now that you’ve been paged and roused from your slumber… what happens next? Do you stumble to your computer, bleary-eyed, trying to find the elusive problem by cross-referencing dashboards and server logs across eleven different browser tabs? Or do you have better tools that you can use?

Fortunately, there’s a quick and easy way to get high-resolution, request-level traces for inspecting your services. At Stripe, we built a custom-built, open-source distributed tracing and monitoring pipeline that allows us to inspect each step of an HTTP request and diagnose the root causes of errors, no matter how obscure they may be. And with a monitoring pipeline that unifies metrics, logs, and traces, you can live the observability dream: the right data, in the right form, right when you need it.

94dcff33cbdf74b5d785369ac54bc1a8?s=128

Aditya Mukerjee

May 22, 2017
Tweet

Transcript

  1. 3.
  2. 12.

    If you need to look at logs, there’s a gap

    in your observability tools @chimeracoder
  3. 20.

    What’s the difference? •If you squint, it’s hard to tell

    them apart •A log is a metric with “longer” information •A trace is a metric that allows “inner joins” @chimeracoder
  4. 34.

    Veneur in 2017 •High availability •Host-local metrics •Global aggregate metrics

    •Probabilistic data structures •… and more! Veneur in 2018 •Automatic cardinality detection •Cross-dashboard integration •Unified client instrumentation •… help us decide the rest! @chimeracoder