Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Distributed Tracing: Understand how your components work together - BuildStuffUA 2019

Distributed Tracing: Understand how your components work together - BuildStuffUA 2019

José Carlos Chávez

November 19, 2019
Tweet

More Decks by José Carlos Chávez

Other Decks in Programming

Transcript

  1. 2 Expedia Group Proprietary and Confidential About me - Software

    Engineer at Expedia Group - Zipkin core team member and open source contributor for observability projects @jcchavezs - #BuildStuffUA
  2. 4 Expedia Group Proprietary and Confidential Distributed systems @jcchavezs -

    #BuildStuffUA A collection of independent components appears to its users as a single coherent system. Characteristics: - Concurrency - No global clock - Independent failures Image source: https://link.medium.com/jey42ga7p1
  3. 5 Expedia Group Proprietary and Confidential Water heater Gas supplier

    Cold water storage tank Shutoff valve First floor branch Tank valve 爆 $❄#☭ @jcchavezs - #BuildStuffUA Distributed Systems
  4. 6 Expedia Group Proprietary and Confidential Auth service Images service

    Videos service DB2 DB3 DB4 Error 1152 ER_ABORTING_CONNECTION 500 Internal Error 500 Internal Error GET /media/e5k2 API Proxy DB1 Media API Distributed Systems @jcchavezs - #BuildStuffUA
  5. 7 Expedia Group Proprietary and Confidential Water heater Gas supplier

    Cold water storage tank Shutoff valve First floor branch Tank valve 爆 $❄#☭ I AM HERE! First floor distributor is clogged! Distributed Systems @jcchavezs - #BuildStuffUA
  6. 9 Expedia Group Proprietary and Confidential API Proxy Auth service

    Media API Images service Videos service DB2 DB3 DB4 500 Internal Error 500 Internal Error GET /media/e5k2 DB1 Error 1152 ER_ABORTING_CONNECTION Logs & Concurrency @jcchavezs - #BuildStuffUA
  7. 10 Expedia Group Proprietary and Confidential [24/Oct/2017 13:50:07 +0000] “GET

    /media HTTP/1.1” 200 … **0/13548” [24/Oct/2017 13:50:07 +0000] “GET /media HTTP/1.1” 200 … **0/23948” [24/Oct/2017 13:50:08 +0000] “GET /media HTTP/1.1” 200 … **0/12396” [24/Oct/2017 13:50:07 +0000] “GET /videos HTTP/1.1” 200 … **0/23748” [24/Oct/2017 13:50:07 +0000] “GET /images HTTP/1.1” 200 … **0/23248” [24/Oct/2017 13:50:07 +0000] “GET /auth HTTP/1.1” 200 … **0/26548” [24/Oct/2017 13:50:07 +0000] “POST /media HTTP/1.1” 200 … **0/13148” [24/Oct/2017 13:50:07 +0000] “GET /media HTTP/1.1” 200 … **0/2588” [24/Oct/2017 13:50:07 +0000] “GET /auth HTTP/1.1” 500 … **0/3248” [24/Oct/2017 13:50:07 +0000] “POST /media HTTP/1.1” 200 … **0/23548” [24/Oct/2017 13:50:07 +0000] “GET /images HTTP/1.1” 200 … **0/22598” ... ? ? Logs & Concurrency @jcchavezs - #BuildStuffUA
  8. 11 Expedia Group Proprietary and Confidential Why is it hard

    to operate a Distributed System? • Systems change all the time • Things fail in unexpected ways • Unknown unknowns • Most problems are the convergence of many different things failing at once • Everyone in the team is supposed to respond with the same level of confidence and tools no matter experience or expertise and the more components, the less individuals know about them @jcchavezs - #BuildStuffUA Distributed Systems
  9. 12 Expedia Group Proprietary and Confidential Water heater Gas supplier

    Cold water storage tank Shutoff valve First floor branch Tank valve 爆 $❄#☭ I AM HERE! First floor distributor is clogged! Distributed Systems @jcchavezs - #BuildStuffUA
  10. 14 Expedia Group Proprietary and Confidential API Proxy Media API

    Auth Videos Images Time error [1508410442] no cache for resource, retrieving from DB TraceID d52d38b69b0fb15efa I AM HERE! Aborted connection Distributed Tracing @jcchavezs - #BuildStuffUA
  11. 15 Expedia Group Proprietary and Confidential • What services did

    a request/message pass through? • What occurred in each service for a given request/message? • Where did the error happen? • Where are the bottlenecks? • What is the critical path for a request? • Who should I page? Distributed Tracing The Answers @jcchavezs - #BuildStuffUA
  12. 17 Expedia Group Proprietary and Confidential Distributed Tracing & friends

    Distributed Tracing @jcchavezs - #BuildStuffUA Logs tell you that an event happened. Metrics tell you how many events of this type are happening in the system. Tracing tells you what happened (who did what) and the impact of that propagated across your system. Image source: https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html
  13. 18 Expedia Group Proprietary and Confidential op 1 op 2

    The Benefits Distributed Tracing • Immediate feedback • System insight, clarifies non trivial interactions • Visibility to critical paths and dependencies • Understand latencies • Request scoped, not request’s lifecycle scoped. @jcchavezs - #BuildStuffUA
  14. 19 Expedia Group Proprietary and Confidential /things auth.Auth Time GET

    /videos mysql.Get T R A C E Trace’s Anatomy Distributed Tracing • A trace shows an execution path through a distributed system • A span in the trace represents a logical unit of work (with a start and end) • A context includes information that should be propagated across services • Tags and logs (optional) add complementary information to spans. @jcchavezs - #BuildStuffUA
  15. 20 Expedia Group Proprietary and Confidential Leg 1: inbound propagation

    Leg 2: outbound propagation Leg 3: in-process propagation Distributed Tracing Distributed Tracing Source: https://link.medium.com/BXTM1u5oH1 @jcchavezs - #BuildStuffUA
  16. 21 Expedia Group Proprietary and Confidential When a service process

    a request or consume a message it will extract (if possible) the context from upstream to continue the trace, otherwise it will start a new trace. API Proxy Media API GET /media/{id} TraceID: fAf3oXL6DS SpanID: dZ0xHIBa1A ... Leg 1: Inbound propagation Distributed Tracing @jcchavezs - #BuildStuffUA
  17. 22 Expedia Group Proprietary and Confidential When a service makes

    an outbound call to another service it will inject the context in the request (headers) or message (metadata). Media API Video service GET /videos TraceID: fAf3oXL6DS ParentID: y74fr5udj SpanID: dZ0xHIBa1A http/get Leg 2: Outbound propagation Distributed Tracing @jcchavezs - #BuildStuffUA TraceID: fAf3oXL6DS SpanID: y74fr5udj
  18. 23 Expedia Group Proprietary and Confidential When performing an operation

    inside the service it will use the server context as a parent to create local spans. mysql.Query redis.Get Media API Cache service Images service GET /images Leg 3: In process propagation Distributed Tracing @jcchavezs - #BuildStuffUA
  19. 24 Expedia Group Proprietary and Confidential API Proxy Media API

    Auth Videos Images Time error [1508410442] no cache for resource, retrieving from DB TraceID d52d38b69b0fb15efa I AM HERE! Aborted connection Distributed Tracing @jcchavezs - #BuildStuffUA
  20. 25 Expedia Group Proprietary and Confidential Are they all benefits?

    Overhead for users: • Observability tools are meant to be unintrusive • Sampling reduces overhead • (Don’t) trace every single operation Overhead for developers: • Not all libraries are ready to plug instruments • Instrumentation can be delegated to common frameworks • Right sampling is hard Distributed Tracing @jcchavezs - #BuildStuffUA
  21. 27 Expedia Group Proprietary and Confidential Based on B3 and

    inspired by Google Dapper (2010). It was open sourced by Twitter (2012). • Mature tracing model emerged from users’ needs. • Used by large companies like LINE, Netflix, SoundCloud and Yelp but also small ones. • Strong community: ◦ @zipkinproject ◦ gitter.im/openzipkin Zipkin @jcchavezs - #BuildStuffUA
  22. 28 Expedia Group Proprietary and Confidential Service (instrumented) Transport Collect

    spans Collector API UI Storage DB Visualize Retrieve data Store spans http/kafka/grpc Receive spans Deserialize and schedule for storage Cassandra/MySQL/ElasticSearch Zipkin @jcchavezs - #BuildStuffUA
  23. 37 Expedia Group Proprietary and Confidential Summary • Distributed Systems

    are complex and will be. • Distributed tracing helps you to understand latencies, critical paths and errors in within a request or message flow. • Distributed Tracing provides contextual insights within a request, it is complementary to other observability tools. @jcchavezs - #BuildStuffUA