Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What do users get out of Observability

What do users get out of Observability

José Carlos Chávez

May 27, 2020
Tweet

More Decks by José Carlos Chávez

Other Decks in Programming

Transcript

  1. Brought to you by: What do users get out of

    observability? José Carlos Chávez Expedia Group / Zipkin
  2. What is the problem to solve? Observability emerges from the

    need of users willing to untangle the complexity coming from more distributed and independent software components and their interactions. We need to understand: - Interactions & Correlations - Operational deviations - Failure modes - Critical components/paths
  3. What do users usually look for? Beginners: - Instrumentation for

    different languages and models (client/server, messaging, etc) - Working examples - Right sampling rate Intermediate: - Meaningful and more metadata (tags, logs, etc) - Correlation across observability tools - Dependency graphs - Right sampling strategy Advanced: - Post facto processing and aggregation - Proactive feedback - More configurable sampling
  4. What do users really want? Tricky question. - that failing

    request to be traced - that log field with the malformed data to be included - that deviating metric to be emitted - that alert to be configured in the deviating metric - that thing they need when they need it CERTAINTY
  5. Unfortunately it is 2020 not 2029 - Recording 100% of

    data is not an option for every single company due to scale issues. - High cardinality is expensive and probably useless in many of the cases. - 100% availability is the new 100% coverage. - Transition from a reactive to a proactive model is still WIP
  6. What is Zipkin - Distributed Tracing solution based on BBB

    and inspired by Google Dapper (2010). It was open sourced by Twitter (2012). - Mature tracing model emerged from user’s use cases and thousands of hours of support. - Used by large companies like LINE, Netflix, SoundCloud and Yelp but also small ones. - Strong and heterogeneous community
  7. What can Zipkin do for you? It can help you

    to: - Understand request latency sources - Identify critical path in a request that traverses many components - Get an overview of your services dependencies - Pin point the service at fault when an error occurs
  8. What can Zipkin offer you? - Advanced instrumentation for various

    frameworks/libraries (26+ official ones ONLY in Java). - Various exporters to different storages - Comprehensive UI - Knowledge spreading (RATIONALEs, site docs) - Supporting community
  9. Popular Zipkin features - Data model - Propagation format (B3)

    - Integration with other observability tools for both server and instrumentation (e.g. loggers and metrics ingestion). - Versatile instrumentation API, embracing interop with other tracing libraries (e.g. OpenTracing, AWS X-ray, Haystack, etc.)
  10. Experimental Zipkin features (mostly driven by users) - Firehose mode

    (no sampling, by Yelp) - Secondary Sampling (sampling triggers, by Netflix) - Kafka Storage & Aggregations (post facto sampling, ipso facto aggregations) - VoltDB storage (post facto sampling) - Storage forwarder (multi storage)
  11. What is next for Zipkin? - Tunable propagated fields -

    Flexible server configurations - Abstracted messaging instrumentation - More instrumentations for popular languages/frameworks
  12. BONUS: Haystack Observability platform developed and used at Expedia Group.

    - Haystack Trends: Find trends among spans data - Adaptive alerting: Anomalies detector - Blobs: Request/Response recorder - Pitchfork: Ingest and dispatch zipkin data into haystack/zipkin
  13. Conclusions - Observability is a mean, not a goal. -

    Different users have different scales and different needs, either way you need to know their needs. - Data collection is foundational for observability - Analysis and processing of data is becoming more and more important
  14. See also - Scaling Distributed Tracing - https://link.medium.com/VZvexUAAv6 - The

    Observability Hierarchy - https://www.instana.com/blog/the-observability-hierarc hy/ - Observability of distributed Systems - https://speakerdeck.com/jcchavezs/observability-of-dis tributed-systems