Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Making Service Launches Boring with Distributed Tracing

Karthik Kumar
September 18, 2019

Making Service Launches Boring with Distributed Tracing

I gave a talk at the SF Site Reliability Engineering Meetup. The talk was focused on how incorporating distributed tracing when developing a service can ensure a successful launch. It also included a few best tracing best-practices that Lightstep uses internally.

Karthik Kumar

September 18, 2019
Tweet

More Decks by Karthik Kumar

Other Decks in Programming

Transcript

  1. “My team has Distributed Tracing, I just don’t know when

    it’s useful” -Francisco (Last month’s SRE meetup)
  2. Tip #1: Trace close to business value Identify important transactions

    Identify critical path of request Add traces!
  3. Tip #4: Add useful tags Version tags to track deployments

    Feature flags tags to track config changes Host identifier tags to correlate with machine metrics Customer identifier tags to correlate with business value
  4. Summary When is distributed tracing valuable? Rolling out new services

    Finding and fixing issues Identifying optimizations
  5. Summary What are some distributed tracing best practices? Trace close

    to your caller Trace communication layer Trace dependencies Add useful tags
  6. Metrics & Logs are not enough Metrics Scoped to a

    single system Vulnerable to high-cardinality tags Logs Scoped to a single system Cost scales with usage
  7. Life is a box of trade-offs Logs Metrics Traces Cost

    scales gracefully – ✓ ✓ Accounts for all data (i.e., unsampled) ✓ ✓ – Immune to cardinality ✓ – ✓