Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Tracing at UBER Scale

Distributed Tracing at UBER Scale

Presented at Monitorama-PDF 2017.
Video: https://vimeo.com/221070602

Yuri Shkuro

May 24, 2017
Tweet

More Decks by Yuri Shkuro

Other Decks in Programming

Transcript

  1. Distributed Tracing at UBER Scale Crea7ng a treasure map for

    your monitoring data Yuri Shkuro, UBER Technologies
  2. ABOUT ME •  SoAware Engineer on the Observability team in

    NYC •  Working on the open source distributed tracing system Jaeger •  Co-founded the OpenTracing project •  Banking industry survivor •  Github: yurishkuro •  TwiLer: @yurishkuro
  3. Why Distributed Tracing •  Distributed transac7on monitoring •  Performance /

    latency op7miza7on •  Root cause analysis •  Service dependency analysis •  Distributed context propaga7on (“baggage”)
  4. JAEGER, Distributed Tracing •  Open Source •  OpenTracing inside • 

    In ac7ve development •  PRs are welcome •  Zipkin compa7ble •  github.com/uber/jaeger
  5. Context Propaga7on A B C D E {context} {context} {context}

    {context} Unique ID → {context} Edge service
  6. Headers: . . . Trace ID . . . Instrumentation

    APPLICATION / MICROSERVICE Handler Context [Span] Client Context [Span] Inbound HTTP Request Instrumentation Headers: . . . Trace ID . . . Outbound HTTP Request Context Propaga7on
  7. It’s Also the Frameworks •  Go: stdlib, gorilla, … • 

    Java: jaxrs2, okhLp, ApacheHLpClient, … •  Python: Flask, Django, Tornado, urllib2, … •  Node.js – who knows…
  8. No Help With In-Process Propaga7on •  Must be done manually

    •  UBER has 2000-3000 microservices •  Resources of the tracing team are limited •  Developers must instrument their code!
  9. Recap: Why Distributed Tracing •  Distributed transac7on monitoring •  Performance

    / latency op7miza7on •  Root cause analysis •  Service dependency analysis •  Distributed context propaga7on (“baggage”)
  10. Service Dependency Analysis •  Explain to us what we just

    built •  Who are my dependencies •  Workflow analysis •  Where is all this traffic coming from? •  Service 7ers
  11. Baggage •  Tenancy, test or produc7on – Set at the top

    – Used at the storage layer, prod or test DB •  Authen7ca7on tokens – Signed user or service iden7ty – Checked at mul7ple levels
  12. S7cks and Carrots •  Get other teams build features on

    top – Performance team – Capacity & cost accoun7ng – Baggage •  More carrots •  Eventually they become s7cks (peer pressure)
  13. Does Service X Report Traces? •  Daily aggrega7on job • 

    Auto-book 7ckets •  Build a dashboard •  Pass/Fail: too easy to pass
  14. Trace Quality Score •  Inspect traces – See a caller, but

    no spans •  Join with other data – Rou7ng logs •  Auto-book 7ckets (carefully, not for everyone) – With detailed report
  15. Thank You •  Jaeger –  hLps://github.com/uber/jaeger –  Blog: Evolving Distributed

    Tracing at UBER –  Blog: Take OpenTracing for a HotROD Ride •  OpenTracing: hLp://opentracing.io/ •  We are hiring •  @yurishkuro