Slide 1

Slide 1 text

Brought to you by: What do users get out of observability? José Carlos Chávez Expedia Group / Zipkin

Slide 2

Slide 2 text

José Carlos Chávez Software Engineer @ Expedia Group Core team member @ Zipkin

Slide 3

Slide 3 text

What is the problem to solve? Observability emerges from the need of users willing to untangle the complexity coming from more distributed and independent software components and their interactions. We need to understand: - Interactions & Correlations - Operational deviations - Failure modes - Critical components/paths

Slide 4

Slide 4 text

“scientia potentia est"

Slide 5

Slide 5 text

What do users usually look for? Beginners: - Instrumentation for different languages and models (client/server, messaging, etc) - Working examples - Right sampling rate Intermediate: - Meaningful and more metadata (tags, logs, etc) - Correlation across observability tools - Dependency graphs - Right sampling strategy Advanced: - Post facto processing and aggregation - Proactive feedback - More configurable sampling

Slide 6

Slide 6 text

What do users really want? Tricky question. - that failing request to be traced - that log field with the malformed data to be included - that deviating metric to be emitted - that alert to be configured in the deviating metric - that thing they need when they need it CERTAINTY

Slide 7

Slide 7 text

Unfortunately it is 2020 not 2029 - Recording 100% of data is not an option for every single company due to scale issues. - High cardinality is expensive and probably useless in many of the cases. - 100% availability is the new 100% coverage. - Transition from a reactive to a proactive model is still WIP

Slide 8

Slide 8 text

Users need solutions that help them understand their problems and boundaries

Slide 9

Slide 9 text

What is Zipkin - Distributed Tracing solution based on BBB and inspired by Google Dapper (2010). It was open sourced by Twitter (2012). - Mature tracing model emerged from user’s use cases and thousands of hours of support. - Used by large companies like LINE, Netflix, SoundCloud and Yelp but also small ones. - Strong and heterogeneous community

Slide 10

Slide 10 text

What can Zipkin do for you? It can help you to: - Understand request latency sources - Identify critical path in a request that traverses many components - Get an overview of your services dependencies - Pin point the service at fault when an error occurs

Slide 11

Slide 11 text

What can Zipkin do for you?

Slide 12

Slide 12 text

What can Zipkin do for you?

Slide 13

Slide 13 text

What can Zipkin offer you? - Advanced instrumentation for various frameworks/libraries (26+ official ones ONLY in Java). - Various exporters to different storages - Comprehensive UI - Knowledge spreading (RATIONALEs, site docs) - Supporting community

Slide 14

Slide 14 text

Popular Zipkin features - Data model - Propagation format (B3) - Integration with other observability tools for both server and instrumentation (e.g. loggers and metrics ingestion). - Versatile instrumentation API, embracing interop with other tracing libraries (e.g. OpenTracing, AWS X-ray, Haystack, etc.)

Slide 15

Slide 15 text

Experimental Zipkin features (mostly driven by users) - Firehose mode (no sampling, by Yelp) - Secondary Sampling (sampling triggers, by Netflix) - Kafka Storage & Aggregations (post facto sampling, ipso facto aggregations) - VoltDB storage (post facto sampling) - Storage forwarder (multi storage)

Slide 16

Slide 16 text

What is next for Zipkin? - Tunable propagated fields - Flexible server configurations - Abstracted messaging instrumentation - More instrumentations for popular languages/frameworks

Slide 17

Slide 17 text

BONUS: Haystack Observability platform developed and used at Expedia Group. - Haystack Trends: Find trends among spans data - Adaptive alerting: Anomalies detector - Blobs: Request/Response recorder - Pitchfork: Ingest and dispatch zipkin data into haystack/zipkin

Slide 18

Slide 18 text

Conclusions - Observability is a mean, not a goal. - Different users have different scales and different needs, either way you need to know their needs. - Data collection is foundational for observability - Analysis and processing of data is becoming more and more important

Slide 19

Slide 19 text

See also - Scaling Distributed Tracing - https://link.medium.com/VZvexUAAv6 - The Observability Hierarchy - https://www.instana.com/blog/the-observability-hierarc hy/ - Observability of distributed Systems - https://speakerdeck.com/jcchavezs/observability-of-dis tributed-systems

Slide 20

Slide 20 text

Thank You! José Carlos Chávez Twitter: @jcchavezs LinkedIn: jcchavezs [email protected]