$30 off During Our Annual Pro Sale. View Details »

What do users get out of Observability

What do users get out of Observability

José Carlos Chávez

May 27, 2020
Tweet

More Decks by José Carlos Chávez

Other Decks in Programming

Transcript

  1. Brought to you by:
    What do users
    get out of
    observability?
    José Carlos Chávez
    Expedia Group / Zipkin

    View Slide

  2. José Carlos Chávez
    Software Engineer @ Expedia Group
    Core team member @ Zipkin

    View Slide

  3. What is the problem to solve?
    Observability emerges from the need of users
    willing to untangle the complexity coming from
    more distributed and independent software
    components and their interactions.
    We need to understand:
    - Interactions & Correlations
    - Operational deviations
    - Failure modes
    - Critical components/paths

    View Slide

  4. “scientia potentia est"

    View Slide

  5. What do users usually look for?
    Beginners:
    - Instrumentation for
    different languages
    and models
    (client/server,
    messaging, etc)
    - Working examples
    - Right sampling rate
    Intermediate:
    - Meaningful and
    more metadata
    (tags, logs, etc)
    - Correlation across
    observability tools
    - Dependency graphs
    - Right sampling
    strategy
    Advanced:
    - Post facto
    processing and
    aggregation
    - Proactive feedback
    - More configurable
    sampling

    View Slide

  6. What do users really want?
    Tricky question.
    - that failing request to be traced
    - that log field with the malformed data to be
    included
    - that deviating metric to be emitted
    - that alert to be configured in the deviating
    metric
    - that thing they need when they need it
    CERTAINTY

    View Slide

  7. Unfortunately it is 2020 not 2029
    - Recording 100% of data is not an option for
    every single company due to scale issues.
    - High cardinality is expensive and probably
    useless in many of the cases.
    - 100% availability is the new 100% coverage.
    - Transition from a reactive to a proactive
    model is still WIP

    View Slide

  8. Users need solutions that
    help them understand
    their problems and
    boundaries

    View Slide

  9. What is Zipkin
    - Distributed Tracing solution based on BBB
    and inspired by Google Dapper (2010). It was
    open sourced by Twitter (2012).
    - Mature tracing model emerged from user’s
    use cases and thousands of hours of support.
    - Used by large companies like LINE, Netflix,
    SoundCloud and Yelp but also small ones.
    - Strong and heterogeneous community

    View Slide

  10. What can Zipkin do for you?
    It can help you to:
    - Understand request latency sources
    - Identify critical path in a request that traverses
    many components
    - Get an overview of your services dependencies
    - Pin point the service at fault when an error occurs

    View Slide

  11. What can Zipkin do for you?

    View Slide

  12. What can Zipkin do for you?

    View Slide

  13. What can Zipkin offer you?
    - Advanced instrumentation for various
    frameworks/libraries (26+ official ones ONLY in
    Java).
    - Various exporters to different storages
    - Comprehensive UI
    - Knowledge spreading (RATIONALEs, site docs)
    - Supporting community

    View Slide

  14. Popular Zipkin features
    - Data model
    - Propagation format (B3)
    - Integration with other observability tools for both
    server and instrumentation (e.g. loggers and metrics
    ingestion).
    - Versatile instrumentation API, embracing interop
    with other tracing libraries (e.g. OpenTracing, AWS
    X-ray, Haystack, etc.)

    View Slide

  15. Experimental Zipkin features
    (mostly driven by users)
    - Firehose mode (no sampling, by Yelp)
    - Secondary Sampling (sampling triggers, by
    Netflix)
    - Kafka Storage & Aggregations (post facto
    sampling, ipso facto aggregations)
    - VoltDB storage (post facto sampling)
    - Storage forwarder (multi storage)

    View Slide

  16. What is next for Zipkin?
    - Tunable propagated fields
    - Flexible server configurations
    - Abstracted messaging instrumentation
    - More instrumentations for popular
    languages/frameworks

    View Slide

  17. BONUS: Haystack
    Observability platform developed and used at
    Expedia Group.
    - Haystack Trends: Find trends among spans
    data
    - Adaptive alerting: Anomalies detector
    - Blobs: Request/Response recorder
    - Pitchfork: Ingest and dispatch zipkin data into
    haystack/zipkin

    View Slide

  18. Conclusions
    - Observability is a mean, not a goal.
    - Different users have different scales and different needs,
    either way you need to know their needs.
    - Data collection is foundational for observability
    - Analysis and processing of data is becoming more and
    more important

    View Slide

  19. See also
    - Scaling Distributed Tracing -
    https://link.medium.com/VZvexUAAv6
    - The Observability Hierarchy -
    https://www.instana.com/blog/the-observability-hierarc
    hy/
    - Observability of distributed Systems -
    https://speakerdeck.com/jcchavezs/observability-of-dis
    tributed-systems

    View Slide

  20. Thank You!
    José Carlos Chávez
    Twitter: @jcchavezs
    LinkedIn: jcchavezs
    [email protected]

    View Slide