$30 off During Our Annual Pro Sale. View Details »

Distributed Tracing: Understand how your components work together - BuildStuffUA 2019

Distributed Tracing: Understand how your components work together - BuildStuffUA 2019

José Carlos Chávez

November 19, 2019
Tweet

More Decks by José Carlos Chávez

Other Decks in Programming

Transcript

  1. 1
    Distributed Tracing: Understand
    how your components work together
    BuildStuff Ukraine 2019 Photo by Samuel Sianipar

    View Slide

  2. 2
    Expedia Group Proprietary and Confidential
    About me
    - Software Engineer at
    Expedia Group
    - Zipkin core team member
    and open source contributor
    for observability projects
    @jcchavezs - #BuildStuffUA

    View Slide

  3. 3
    Expedia Group Proprietary and Confidential
    Distributed Systems

    View Slide

  4. 4
    Expedia Group Proprietary and Confidential
    Distributed systems
    @jcchavezs - #BuildStuffUA
    A collection of independent
    components appears to its users
    as a single coherent system.
    Characteristics:
    - Concurrency
    - No global clock
    - Independent failures
    Image source: https://link.medium.com/jey42ga7p1

    View Slide

  5. 5
    Expedia Group Proprietary and Confidential
    Water heater
    Gas supplier
    Cold water
    storage tank
    Shutoff
    valve
    First floor
    branch
    Tank valve

    $❄#☭
    @jcchavezs - #BuildStuffUA
    Distributed Systems

    View Slide

  6. 6
    Expedia Group Proprietary and Confidential
    Auth
    service
    Images
    service
    Videos
    service
    DB2
    DB3
    DB4
    Error 1152
    ER_ABORTING_CONNECTION
    500 Internal Error
    500 Internal Error
    GET /media/e5k2
    API Proxy
    DB1
    Media
    API
    Distributed Systems
    @jcchavezs - #BuildStuffUA

    View Slide

  7. 7
    Expedia Group Proprietary and Confidential
    Water heater
    Gas supplier
    Cold water
    storage tank
    Shutoff
    valve
    First floor
    branch
    Tank valve

    $❄#☭
    I AM HERE!
    First floor
    distributor is
    clogged!
    Distributed Systems
    @jcchavezs - #BuildStuffUA

    View Slide

  8. 8
    Expedia Group Proprietary and Confidential
    We do have that, it is called logs!

    View Slide

  9. 9
    Expedia Group Proprietary and Confidential
    API Proxy
    Auth
    service
    Media
    API
    Images
    service
    Videos
    service
    DB2
    DB3
    DB4
    500 Internal Error
    500 Internal Error
    GET /media/e5k2
    DB1
    Error 1152
    ER_ABORTING_CONNECTION
    Logs & Concurrency
    @jcchavezs - #BuildStuffUA

    View Slide

  10. 10
    Expedia Group Proprietary and Confidential
    [24/Oct/2017 13:50:07 +0000] “GET /media HTTP/1.1” 200 … **0/13548”
    [24/Oct/2017 13:50:07 +0000] “GET /media HTTP/1.1” 200 … **0/23948”
    [24/Oct/2017 13:50:08 +0000] “GET /media HTTP/1.1” 200 … **0/12396”
    [24/Oct/2017 13:50:07 +0000] “GET /videos HTTP/1.1” 200 … **0/23748”
    [24/Oct/2017 13:50:07 +0000] “GET /images HTTP/1.1” 200 … **0/23248”
    [24/Oct/2017 13:50:07 +0000] “GET /auth HTTP/1.1” 200 … **0/26548”
    [24/Oct/2017 13:50:07 +0000] “POST /media HTTP/1.1” 200 … **0/13148”
    [24/Oct/2017 13:50:07 +0000] “GET /media HTTP/1.1” 200 … **0/2588”
    [24/Oct/2017 13:50:07 +0000] “GET /auth HTTP/1.1” 500 … **0/3248”
    [24/Oct/2017 13:50:07 +0000] “POST /media HTTP/1.1” 200 … **0/23548”
    [24/Oct/2017 13:50:07 +0000] “GET /images HTTP/1.1” 200 … **0/22598”
    ...
    ?
    ?
    Logs & Concurrency
    @jcchavezs - #BuildStuffUA

    View Slide

  11. 11
    Expedia Group Proprietary and Confidential
    Why is it hard to operate a Distributed System?
    ● Systems change all the time
    ● Things fail in unexpected ways
    ● Unknown unknowns
    ● Most problems are the convergence of many different things
    failing at once
    ● Everyone in the team is supposed to respond with the same level
    of confidence and tools no matter experience or expertise and
    the more components, the less individuals know about them
    @jcchavezs - #BuildStuffUA
    Distributed Systems

    View Slide

  12. 12
    Expedia Group Proprietary and Confidential
    Water heater
    Gas supplier
    Cold water
    storage tank
    Shutoff
    valve
    First floor
    branch
    Tank valve

    $❄#☭
    I AM HERE!
    First floor
    distributor
    is clogged!
    Distributed Systems
    @jcchavezs - #BuildStuffUA

    View Slide

  13. 13
    Expedia Group Proprietary and Confidential
    Distributed Tracing to unclog your
    pipes

    View Slide

  14. 14
    Expedia Group Proprietary and Confidential
    API Proxy
    Media API
    Auth
    Videos
    Images
    Time
    error
    [1508410442] no cache for
    resource, retrieving from DB
    TraceID
    d52d38b69b0fb15efa
    I AM HERE!
    Aborted
    connection
    Distributed Tracing
    @jcchavezs - #BuildStuffUA

    View Slide

  15. 15
    Expedia Group Proprietary and Confidential
    ● What services did a request/message pass through?
    ● What occurred in each service for a given request/message?
    ● Where did the error happen?
    ● Where are the bottlenecks?
    ● What is the critical path for a request?
    ● Who should I page?
    Distributed Tracing
    The Answers
    @jcchavezs - #BuildStuffUA

    View Slide

  16. 16
    Expedia Group Proprietary and Confidential
    Distributed Tracing
    Source: https://twitter.com/rakyll/status/971231712049971200
    @jcchavezs - #BuildStuffUA

    View Slide

  17. 17
    Expedia Group Proprietary and Confidential
    Distributed Tracing & friends
    Distributed Tracing
    @jcchavezs - #BuildStuffUA
    Logs tell you that an event
    happened.
    Metrics tell you how many
    events of this type are
    happening in the system.
    Tracing tells you what
    happened (who did what) and
    the impact of that propagated
    across your system.
    Image source: https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html

    View Slide

  18. 18
    Expedia Group Proprietary and Confidential
    op 1
    op 2
    The Benefits
    Distributed Tracing
    ● Immediate feedback
    ● System insight, clarifies non
    trivial interactions
    ● Visibility to critical paths and
    dependencies
    ● Understand latencies
    ● Request scoped, not
    request’s lifecycle scoped.
    @jcchavezs - #BuildStuffUA

    View Slide

  19. 19
    Expedia Group Proprietary and Confidential
    /things
    auth.Auth
    Time
    GET /videos
    mysql.Get
    T
    R
    A
    C
    E
    Trace’s Anatomy
    Distributed Tracing
    ● A trace shows an execution
    path through a distributed
    system
    ● A span in the trace
    represents a logical unit of
    work (with a start and end)
    ● A context includes
    information that should be
    propagated across services
    ● Tags and logs (optional) add
    complementary information
    to spans.
    @jcchavezs - #BuildStuffUA

    View Slide

  20. 20
    Expedia Group Proprietary and Confidential
    Leg 1: inbound propagation
    Leg 2: outbound propagation
    Leg 3: in-process propagation
    Distributed Tracing
    Distributed Tracing
    Source: https://link.medium.com/BXTM1u5oH1
    @jcchavezs - #BuildStuffUA

    View Slide

  21. 21
    Expedia Group Proprietary and Confidential
    When a service process a
    request or consume a message
    it will extract (if possible) the
    context from upstream to
    continue the trace, otherwise it
    will start a new trace.
    API Proxy
    Media
    API
    GET /media/{id}
    TraceID: fAf3oXL6DS
    SpanID: dZ0xHIBa1A
    ...
    Leg 1: Inbound propagation
    Distributed Tracing
    @jcchavezs - #BuildStuffUA

    View Slide

  22. 22
    Expedia Group Proprietary and Confidential
    When a service makes an
    outbound call to another
    service it will inject the context
    in the request (headers) or
    message (metadata).
    Media
    API
    Video
    service
    GET /videos
    TraceID: fAf3oXL6DS
    ParentID: y74fr5udj
    SpanID: dZ0xHIBa1A
    http/get
    Leg 2: Outbound propagation
    Distributed Tracing
    @jcchavezs - #BuildStuffUA
    TraceID: fAf3oXL6DS
    SpanID: y74fr5udj

    View Slide

  23. 23
    Expedia Group Proprietary and Confidential
    When performing an operation
    inside the service it will use the
    server context as a parent to
    create local spans.
    mysql.Query
    redis.Get
    Media
    API
    Cache
    service
    Images service
    GET /images
    Leg 3: In process propagation
    Distributed Tracing
    @jcchavezs - #BuildStuffUA

    View Slide

  24. 24
    Expedia Group Proprietary and Confidential
    API Proxy
    Media API
    Auth
    Videos
    Images
    Time
    error
    [1508410442] no cache for
    resource, retrieving from DB
    TraceID
    d52d38b69b0fb15efa
    I AM HERE!
    Aborted
    connection
    Distributed Tracing
    @jcchavezs - #BuildStuffUA

    View Slide

  25. 25
    Expedia Group Proprietary and Confidential
    Are they all benefits?
    Overhead for users:
    • Observability tools are meant to be unintrusive
    • Sampling reduces overhead
    • (Don’t) trace every single operation
    Overhead for developers:
    • Not all libraries are ready to plug instruments
    • Instrumentation can be delegated to common frameworks
    • Right sampling is hard
    Distributed Tracing
    @jcchavezs - #BuildStuffUA

    View Slide

  26. 26
    Expedia Group Proprietary and Confidential
    Introducing Zipkin

    View Slide

  27. 27
    Expedia Group Proprietary and Confidential
    Based on B3 and inspired by Google Dapper
    (2010). It was open sourced by Twitter (2012).
    ● Mature tracing model emerged from
    users’ needs.
    ● Used by large companies like LINE,
    Netflix, SoundCloud and Yelp but also
    small ones.
    ● Strong community:
    ○ @zipkinproject
    ○ gitter.im/openzipkin
    Zipkin
    @jcchavezs - #BuildStuffUA

    View Slide

  28. 28
    Expedia Group Proprietary and Confidential
    Service
    (instrumented)
    Transport
    Collect
    spans
    Collector
    API UI
    Storage
    DB
    Visualize
    Retrieve data
    Store spans
    http/kafka/grpc
    Receive spans
    Deserialize and
    schedule for
    storage
    Cassandra/MySQL/ElasticSearch
    Zipkin
    @jcchavezs - #BuildStuffUA

    View Slide

  29. 29
    Expedia Group Proprietary and Confidential
    @jcchavezs - #BuildStuffUA

    View Slide

  30. 30
    Expedia Group Proprietary and Confidential
    @jcchavezs - #BuildStuffUA

    View Slide

  31. 31
    Expedia Group Proprietary and Confidential
    @jcchavezs - #BuildStuffUA

    View Slide

  32. 32
    Expedia Group Proprietary and Confidential
    @jcchavezs - #BuildStuffUA

    View Slide

  33. 33
    Expedia Group Proprietary and Confidential
    @jcchavezs - #BuildStuffUA

    View Slide

  34. 34
    Expedia Group Proprietary and Confidential
    @jcchavezs - #BuildStuffUA

    View Slide

  35. 35
    Expedia Group Proprietary and Confidential
    @jcchavezs - #BuildStuffUA

    View Slide

  36. 36
    Expedia Group Proprietary and Confidential
    @jcchavezs - #BuildStuffUA

    View Slide

  37. 37
    Expedia Group Proprietary and Confidential
    Summary
    ● Distributed Systems are complex and will be.
    ● Distributed tracing helps you to understand latencies, critical
    paths and errors in within a request or message flow.
    ● Distributed Tracing provides contextual insights within a request, it
    is complementary to other observability tools.
    @jcchavezs - #BuildStuffUA

    View Slide

  38. 38
    Expedia Group Proprietary and Confidential
    Thank you
    Q&A

    View Slide