Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Øredev 2024 - Becoming a cloud-native doctor: U...

Øredev 2024 - Becoming a cloud-native doctor: Using metrics and traces to diagnose our cloud-native applications

Diagnosing ailments and diseases in the human body is a complicated task! Doctors require years of dedication to the science of understanding key body metrics and data to correctly diagnose potential issues and life threatening scenarios. With the rise in complexity of both our applications and infrastructure in cloud-native app development, can we gain insights and inspiration from this world of medicine? How can we more effectively instrument our apps to diagnose potential failures, issues and bottlenecks in our system?
Join this session to learn about how making effective use of open source frameworks like Open Telemetry and MicroProfile, and the data they provide through metrics and traces, can enable us to become effective cloud-native physicians and successfully diagnose our cloud-native applications.

Grace Jansen

November 06, 2024
Tweet

More Decks by Grace Jansen

Other Decks in Technology

Transcript

  1. Becoming a cloud-native doctor: Using metrics and traces to diagnose

    our cloud-native applications Grace Jansen IBM @gracejansen27
  2. Types of metrics: • Physical symptoms • Written notes •

    Heart monitors • Blood oxygen levels • CT/MRI scans • Blood tests • Medical history • Genomics Medicine: Diagnosing with Biometrics
  3. Software: Diagnosing with Metrics https://www.softermii.com/blog/top-9-software-development-metrics-for-measuring-productivity-and-products-quality Types of metrics: • App

    health • API requests • Memory consumption • Network traffic • Error rate • Duration of requests • Number of users connected to system • Etc…
  4. CONTEXT HOW? WHY? WHAT HAPPENED? EVENTS PRIOR BEHAVIOUR CHANGE NEW

    ROUTINE NEW STIMULI CHANGE IN DIET STRESS ENVIRONMENTAL CHANGE NEW JOB NEW HOBBY NEW PETS WHAT’S CHANGED? DID THEY FALL? HARMFUL EVENTS PHYSICAL ACCIDENT ACCIDENT FAMILY HISTORY HAS THIS HAPPENED BEFORE? HOW DO THEY FEEL? WHERE IS THE PAIN? WHAT DOES IT FEEL LIKE? LOCATION CHANGE HOLIDAY EXPOSURE TO HARMFUL THINGS CROWDS OBVIOUS CAUSE?
  5. Implementing Observability Instrument systems and applications to collect relevant data

    (e.g. metrics, traces, and logs). 1 2 Send this data to a separate external system that can store and analyse it.
  6. Implementing Observability Instrument systems and applications to collect relevant data

    (e.g. metrics, traces, and logs). 1 2 Send this data to a separate external system that can store and analyse it. 3 Provide visualizations and insights into systems as a whole (including query capability for end users).
  7. Implementing Observability Instrument systems and applications to collect relevant data

    (e.g. metrics, traces, and logs). 1 2 Send this data to a separate external system that can store and analyse it. 3 Provide visualizations and insights into systems as a whole (including query capability for end users).
  8. Instrumentation: Logs Logs • a timestamped message emitted by services

    or other application components, providing coarser-grained or higher-level information about system behaviours (like errors, warnings, etc) and typically will be stored in a set of log files. • not necessarily associated with any particular user request or transaction Logs
  9. Instrumentation: Metrics Metrics • aggregations of numeric data about infrastructure

    or an application over a period of time. Examples include system error rates, CPU utilization, and request rates for a given service. Metrics
  10. Instrumentation: Traces Distributed traces (i.e. Traces) • records the paths

    taken by requests (made by an application or end user) as they disseminate through multi-service architectures, like microservice, macroservice, and serverless applications. Distributed Traces
  11. Key Tracing Concepts Traces • Traces represent requests and consist

    of multiple spans. Spans • Spans are representative of single operations in a request. A span contains a name, time-related data, log messages, and metadata to give information about what occurs during a transaction. Image: https://blog.sentry.io/2021/08/12/distributed-tracing-101-for-full-stack-developers/
  12. Key Tracing Concepts Context • Context is an immutable object

    contained in the span data to identify the unique request that each span is a part of. This data is required for moving trace information across service boundaries, allowing developers to follow a single request through a potentially complex distributed system. Image: https://blog.sentry.io/2021/08/12/distributed-tracing-101-for-full-stack-developers/
  13. Open Telemetry • High-quality, ubiquitous, and portable telemetry to enable

    effective observability • OpenTelemetry is a collection of tools, APIs, and SDKs. • NB: OpenTelemetry ≠ observability back-end https://opentelemetry.io
  14. Health Check Fault Tolerance OpenAPI Config Open Tracing JWT JSON-B

    Rest Client CDI JAX-RS JSON-P Core Integrate Observe https://microprofile.io/ Open cloud-native Java APIs Open Telemetr y GraphQL Reactive Messagin g
  15. Compatible Runtimes Compatible with MicroProfile APIs 2.x and 3.x 4.x

    5.x 6.x 7.x Open Liberty x x x x x (beta) WebSphere Liberty x x x x Quarkus x x Payara Micro x x x WildFly x x x Payara Server x x x x TomEE x x KumuluzEE x Thorntail x JBoss EAP XP x Helidon x x Apache Launcher x https://microprofile.io/compatible
  16. MicroProfile Telemetry 1.0 • Introduced in MicroProfile 6.0 release •

    Adopts OpenTelemetry Tracing • Set of APIs, SDKs, tooling and integrations • Designed for the creation and management of telemetry data (traces) https://github.com/eclipse/microprofile-telemetry
  17. MicroProfile Telemetry 2.0 A set of APIs that are designed

    for the creation and management of telemetry data such as traces, metrics, and logs What's new? • Expose OTel APIs for better UX (210) • Adopt OpenTelemetry Metrics API (141, 149) • Specify metrics provided by platform (151) • Adopt OTel logging (146) • Document usage of Metrics API (184) • Upgrade base OpenTelemetry API (and semantic conventions) (150) From MicroProfile 7 presentation: https://docs.google.com/presentation/d/1gg67Gv38B8QJ9o0u_pwoiQf3NvigB_MgL7ZgabL_I2U/edit#slide=id.p20
  18. MP Telemetry Instrumentation • Automatic Instrumentation: • Jakarta RESTful Web

    Services and MicroProfile Rest Client automatically enlisted in distributed tracing • Manual Instrumentation: • Manual instrumentation can be added via annotations @WithSpan or via CDI injection @Inject Tracer or @Inject Span or programmatic lookup Span.current() • Agent Instrumentation: • Use OpenTelemetry Java Instrumentation project to gather telemetry data without any code modification
  19. Open Liberty https://developer.ibm.com/articles/why-cloud-native-java-developers-love-liberty/ Focus on code Easy to make fast

    and iterative changes Easy to write tests True-to-production testing (as much as possible) Ready for containers Not-in-your-way tools and flexibility
  20. Summary: • Entering a world of increased complexity – we

    can learn from the world of medicine/biology • Effective observability is critical to monitor and understand how our applications are behaving and performing in this complex environment • Context is critical! • Many open-source tools available to help us look through the looking glass, including new standards like Open Telemetry
  21. Resources: • What is observability? - https://www.ibm.com/uk-en/topics/observability • OpenTelemetry and

    MicroProfile: Enabling effective observability for your cloud-native Java applications - https://developer.ibm.com/articles/opentelemetry-effective-observability- for-your-cloud-native-java-apps/ • Tracing your microservices made easy with MicroProfile Telemetry 1.0 - https://openliberty.io/blog/2023/03/10/tracing-with-microprofile- telemetry.html