What do users get out of Observability

Brought to you by: What do users get out of
observability? José Carlos Chávez Expedia Group / Zipkin

José Carlos Chávez Software Engineer @ Expedia Group Core team
member @ Zipkin

What is the problem to solve? Observability emerges from the
need of users willing to untangle the complexity coming from more distributed and independent software components and their interactions. We need to understand: - Interactions & Correlations - Operational deviations - Failure modes - Critical components/paths

“scientia potentia est"

What do users usually look for? Beginners: - Instrumentation for
different languages and models (client/server, messaging, etc) - Working examples - Right sampling rate Intermediate: - Meaningful and more metadata (tags, logs, etc) - Correlation across observability tools - Dependency graphs - Right sampling strategy Advanced: - Post facto processing and aggregation - Proactive feedback - More conﬁgurable sampling

What do users really want? Tricky question. - that failing
request to be traced - that log ﬁeld with the malformed data to be included - that deviating metric to be emitted - that alert to be conﬁgured in the deviating metric - that thing they need when they need it CERTAINTY

Unfortunately it is 2020 not 2029 - Recording 100% of
data is not an option for every single company due to scale issues. - High cardinality is expensive and probably useless in many of the cases. - 100% availability is the new 100% coverage. - Transition from a reactive to a proactive model is still WIP

Users need solutions that help them understand their problems and
boundaries

What is Zipkin - Distributed Tracing solution based on BBB
and inspired by Google Dapper (2010). It was open sourced by Twitter (2012). - Mature tracing model emerged from user’s use cases and thousands of hours of support. - Used by large companies like LINE, Netﬂix, SoundCloud and Yelp but also small ones. - Strong and heterogeneous community

What can Zipkin do for you? It can help you
to: - Understand request latency sources - Identify critical path in a request that traverses many components - Get an overview of your services dependencies - Pin point the service at fault when an error occurs

What can Zipkin do for you?

What can Zipkin offer you? - Advanced instrumentation for various
frameworks/libraries (26+ ofﬁcial ones ONLY in Java). - Various exporters to different storages - Comprehensive UI - Knowledge spreading (RATIONALEs, site docs) - Supporting community

Popular Zipkin features - Data model - Propagation format (B3)
- Integration with other observability tools for both server and instrumentation (e.g. loggers and metrics ingestion). - Versatile instrumentation API, embracing interop with other tracing libraries (e.g. OpenTracing, AWS X-ray, Haystack, etc.)

Experimental Zipkin features (mostly driven by users) - Firehose mode
(no sampling, by Yelp) - Secondary Sampling (sampling triggers, by Netﬂix) - Kafka Storage & Aggregations (post facto sampling, ipso facto aggregations) - VoltDB storage (post facto sampling) - Storage forwarder (multi storage)

What is next for Zipkin? - Tunable propagated ﬁelds -
Flexible server conﬁgurations - Abstracted messaging instrumentation - More instrumentations for popular languages/frameworks

BONUS: Haystack Observability platform developed and used at Expedia Group.
- Haystack Trends: Find trends among spans data - Adaptive alerting: Anomalies detector - Blobs: Request/Response recorder - Pitchfork: Ingest and dispatch zipkin data into haystack/zipkin

Conclusions - Observability is a mean, not a goal. -
Different users have different scales and different needs, either way you need to know their needs. - Data collection is foundational for observability - Analysis and processing of data is becoming more and more important

See also - Scaling Distributed Tracing - https://link.medium.com/VZvexUAAv6 - The
Observability Hierarchy - https://www.instana.com/blog/the-observability-hierarc hy/ - Observability of distributed Systems - https://speakerdeck.com/jcchavezs/observability-of-dis tributed-systems

Thank You! José Carlos Chávez Twitter: @jcchavezs LinkedIn: jcchavezs jcchavezs@gmail.com

What do users get out of Observability

What do users get out of Observability

José Carlos Chávez

More Decks by José Carlos Chávez

Other Decks in Programming

Featured

Transcript

Brought to you by: What do users get out of

José Carlos Chávez Software Engineer @ Expedia Group Core team

What is the problem to solve? Observability emerges from the

“scientia potentia est"

What do users usually look for? Beginners: - Instrumentation for

What do users really want? Tricky question. - that failing

Unfortunately it is 2020 not 2029 - Recording 100% of

Users need solutions that help them understand their problems and

What is Zipkin - Distributed Tracing solution based on BBB

What can Zipkin do for you? It can help you

What can Zipkin do for you?

What can Zipkin do for you?

What can Zipkin offer you? - Advanced instrumentation for various

Popular Zipkin features - Data model - Propagation format (B3)

Experimental Zipkin features (mostly driven by users) - Firehose mode

What is next for Zipkin? - Tunable propagated ﬁelds -

BONUS: Haystack Observability platform developed and used at Expedia Group.

Conclusions - Observability is a mean, not a goal. -

See also - Scaling Distributed Tracing - https://link.medium.com/VZvexUAAv6 - The

Thank You! José Carlos Chávez Twitter: @jcchavezs LinkedIn: jcchavezs jcchavezs@gmail.com