Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Tracing at Netflix

Distributed Tracing at Netflix

This was presented in a private workshop on distributed tracing and it provides details about the current state of distributed tracing at netflix (July 2015)

Nitesh Kant

July 10, 2015
Tweet

More Decks by Nitesh Kant

Other Decks in Technology

Transcript

  1. Nitesh Kant - @NiteshKant - Software Engineer, Cloud Platform, Netflix

    Distributed Tracing @Netflix a.k.a Salp July 2015
  2. Nitesh Kant Who Am I? ❖ Engineer, Cloud Platform, Netflix.

    ❖ Leading Reactive IPC @Netflix ❖ Core contributor, RxNetty* ❖ Introduced distributed tracing inside Netflix (!Yet OSS) * https://github.com/ReactiveX/RxNetty @NiteshKant
  3. The food chain A simplistic hypothetical call chain. Zuul API

    Recommendation service Subscriber service Cache C*
  4. The food chain A simplistic hypothetical call chain. Zuul API

    Recommendation service Subscriber service Cache C* What services does a request touch?
  5. The food chain A simplistic hypothetical call chain. Zuul API

    Recommendation service Subscriber service Cache C* Who Am I dependent on? What services does a request touch?
  6. The food chain A simplistic hypothetical call chain. Zuul API

    Recommendation service Subscriber service Cache C* Who Am I dependent on? Who Am I dependent on? Who calls me? What services does a request touch?
  7. The food chain A simplistic hypothetical call chain. Zuul API

    Recommendation service Subscriber service Cache C* Who Am I dependent on? Who Am I dependent on? Who calls me? Who overwhelmed me now? What services does a request touch?
  8. Request Trace In-memory storage of data till request completion. Zuul

    API Recommendation service Subscriber service Cache C* A simplistic hypothetical call chain.
  9. Request Trace In-memory storage of data till request completion. Zuul

    API Recommendation service Subscriber service Cache C* A simplistic hypothetical call chain.
  10. Request Trace In-memory storage of data till request completion. Zuul

    API Recommendation service Subscriber service Cache C* A simplistic hypothetical call chain.
  11. Request Trace In-memory storage of data till request completion. Zuul

    API Recommendation service Subscriber service Cache C* A simplistic hypothetical call chain.
  12. Request Trace In-memory storage of data till request completion. Inflexible

    data model (averse to custom data) Zuul API Recommendation service Subscriber service Cache C* A simplistic hypothetical call chain.
  13. User Request Tracing Filter IPC instrumentation filter (Server) Async publish

    to Suro* Logging Filter * https://github.com/Netflix/suro Zuul
  14. User Request Tracing Filter IPC instrumentation filter (Server) Zuul IPC

    instrumentation filter (Client) Emit client send annotation Dependency request
  15. Tracing Filter IPC instrumentation filter (Server) Zuul IPC instrumentation filter

    (Client) Emit client receive annotation Dependency response
  16. IPC instrumentation filter (Server) Zuul IPC instrumentation filter (Client) Emit

    server send annotation Dependency response Response Any other service
  17. In Process Within thread boundaries Thread local Across thread boundaries

    Thread “Variable”. Copied on thread boundaries.
  18. Across processes A simplistic hypothetical call chain. Zuul API Recommendation

    Subscriber service Cache C* ❖ Proprietary generic over the network “context propagation” library. ❖ Only HTTP traffic, context propagated as headers. ❖ On receipt, realize back to thread local.
  19. User Request Tracing Filter Dynamic Config Service Get sampling rate

    Zuul ❖ Simple sample rate (% of requests) stored in the configuration service. ❖ Granularity: Per application. ❖ Dynamic: Checked per request.
  20. User Request Tracing Filter Dynamic Config Service Get sampling rate

    Zuul OR Dynamic criterion on request headers, URI, etc.
  21. Data Publishing (via Suro) Tracing Filter IPC instrumentation filter (Server)

    Async publish to Suro* Logging Filter * https://github.com/Netflix/suro Zuul
  22. Flow graphs Source: Druid. Granularity: Aggregated over per cluster. Call

    volume & HTTP response code distribution per edge.
  23. Tech landscape (Today*) Homogeneous architecture. Apps in critical path (customer

    request) are all JVM based.* HTTP/1.1 based services. * July 2015 * Node.js front-end app but IPC calls via JVM sidecar.