Slide 1

Slide 1 text

Monitoring GraalVM Native Image applications with OpenTelemetry NISHIKAWA, Akihiro (Aki) Cloud Solution Architect, Microsoft @logico_jp

Slide 2

Slide 2 text

Who am I? { "name": "Akihiro Nishikawa", "country": "Japan", "favourites": [ "JVM", "GraalVM", "Azure" ], "expertise": [ "Application integration", "Container and Serverless" ] }

Slide 3

Slide 3 text

Agenda  OpenTelemetry  GraalVM  Zero-code Instrumentation of Native Images  Case study  Takeaways

Slide 4

Slide 4 text

OpenTelemetry (OTel)

Slide 5

Slide 5 text

Monitoring large number of components (microservices) is quite hard and difficult. Image: “Mastering Distributed Tracing” by Yuri Shkuro

Slide 6

Slide 6 text

Observability is the ability to understand the internal state of a system by examining its outputs. In the context of software, this means being able to understand the internal state of a system by examining its telemetry data, which includes traces, metrics, and logs. https://opentelemetry.io/docs/what-is-opentelemetry/#what-is-observability

Slide 7

Slide 7 text

tells us what’s happening, why it’s happening and how to fix it. Monitoring vs. Observability Two ways to identify the underlying cause of problems Traces Metrics Logs Profiles Events Monitoring Observability tells us when something is wrong. “proactive” “reactive”

Slide 8

Slide 8 text

Signal instrumentation and export input output application Signals (telemetry data) (logs, memory/CPU usages, etc.) export dashboard (Grafana, Prometheus, etc.) Instrumentation

Slide 9

Slide 9 text

Standards? What to follow for instrumentation and signal export?  Specifications  Implementation (SDKs, APIs, libraries)  Semantic conventions  Communication standards  Compatibilities  etc.

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

The 2nd most active CNCF project https://all.devstats.cncf.io/d/1/activity-repository-groups?orgId=1&var- period=d7&var-repogroups=All

Slide 12

Slide 12 text

Objectives [1] Focuses solely on collecting and delivering telemetry data. [2] Generating actionable insights and storing telemetry data are out of scope.  An observability backend like Jaeger, Prometheus, or other commercial vendors. A major goal of OpenTelemetry is that you can easily instrument your applications or systems[1], no matter their language, infrastructure, or runtime environment. The storage and visualization of telemetry is intentionally left to other tools[2]. https://opentelemetry.io/docs/what-is-opentelemetry/

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Java apps/libraries Collector (vendor agnostic) Receiver Processor Exporter Java Instr. APIs OTel SDK OTLP Transport telemetry data (1) Collector (agent)

Slide 15

Slide 15 text

Java apps/libraries Exporter Java Instr. APIs OTel SDK Transport telemetry data (2) No collector

Slide 16

Slide 16 text

Java apps/libraries Collector (vendor agnostic) Receiver Processor Exporter Java Instr. APIs OTel SDK OTLP Transport telemetry data (3) Gateway Load balancer Receiver Processor Exporter Receiver Processor Exporter

Slide 17

Slide 17 text

Would help you understand OpenTelemetry... Workshop  Observability with OpenTelemetry: From Idea to Insight (by Øyvind Randa and Hans Kristian Flaatten)

Slide 18

Slide 18 text

GraalVM

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

GraalVM An advanced JDK with ahead-of-time Native Image compilation  HotSpot based  JIT compiler (Graal compiler)  Native Image  Ahead-of-time compilation  Other features  polyglot runtime (Truffle) based Python, Node.js, Ruby, etc.

Slide 21

Slide 21 text

GraalVM Native Image Fast start and scale Lower resource usage Predictable performance Improved security Compact packaging Several frameworks and platforms supported

Slide 22

Slide 22 text

Iterative analysis until reaching fixed point. Output How to generate Native Image Points-to Analysis Run initializations Input Application Substrate VM JDK Dependencies (Libraries) Native executable Text section Data section Heap snapshotting AOT compiled code (specific to OS) Image heap

Slide 23

Slide 23 text

Use cases (not limited to) CLI tools Self-contained, start up instantly, and easy to distribute. • picocli (picocli - a mighty tiny command line interface) • Pkl configuration language Pkl :: Pkl Docs (pkl-lang.org) Serverless & Containerized applications Suitable for use in containerized and serverless applications where severe start-up time and memory management are needed. picocli Pkl

Slide 24

Slide 24 text

For more details Conferences  Going AOT: Everything you need to know about GraalVM for Java applications (by Alina Yurenko) Workshop  Multi-Cloud Apps with GraalVM - Up and Running (by Olga Gupalo)

Slide 25

Slide 25 text

Zero-code instrumentation for native applications

Slide 26

Slide 26 text

Zero-code instrumentation for Java apps java -javaagent:path/to/opentelemetry-javaagent.jar \ -Dotel.service.name=your-service-name \ -jar myapp.jar JVM myapp.jar agent Can use bytecode instrumentation.

Slide 27

Slide 27 text

How about zero-code instrumentation of native apps?

Slide 28

Slide 28 text

Java agents don’t support Native Image applications. Support agents at image generation time · Issue #1065 · oracle/graal (github.com)

Slide 29

Slide 29 text

Iterative analysis until reaching fixed point. Output Native Image apps run without dependency on JDK. Points-to Analysis Run initializations Input Application Substrate VM JDK Dependencies (Libraries) Native executable Text section Data section Heap snapshotting AOT compiled code (specific to OS) Image heap

Slide 30

Slide 30 text

Alibaba’s feasibility study OTel Community Day North America 2024

Slide 31

Slide 31 text

Milestone (subject to change, of course ☺) GraalVM for JDK 25 (September 16, 2025) Milestone (github.com)

Slide 32

Slide 32 text

Java apps/libraries Exporter Java Instr. APIs OTel SDK “No collector” approach for native apps Exporter functionality should be packed into native applications.

Slide 33

Slide 33 text

Micrometer can also achieve this! Yes, I know. But you’d like to know another option, right?

Slide 34

Slide 34 text

Spring Boot

Slide 35

Slide 35 text

OpenTelemetry Spring Boot Starter Can collect signals where OTel Java agent does not work. OTel Java agent supports more out of the box instrumentation than Starter.

Slide 36

Slide 36 text

io.opentelemetry opentelemetry-bom io.opentelemetry.instrumentation opentelemetry-instrumentation-bom io.opentelemetry.instrumentation opentelemetry-spring-boot-starter Dependencies

Slide 37

Slide 37 text

Additional settings might be needed in some cases. In case of Azure com.azure.spring spring-cloud-azure-starter-monitor ... org.graalvm.buildtools native-maven-plugin --libc=musl -Djava.security.properties=src/main/resources/custom.security Disabling signature verification is needed when generating native apps with signed JARs.

Slide 38

Slide 38 text

# Most instrumentation properties are enabled by default. # If using Azure Application Insights, the following properties are required applicationinsights.connection.string =InstrumentationKey=000000-0000-0000-0000-0000000000 # If specific instrumentation is needed, we should explicitly enable it after disabling all instumentations otel.instrumentation.common.default-enabled=true Configuration application.properties (application.yml) Out of the box instrumentation | OpenTelemetry

Slide 39

Slide 39 text

Quarkus

Slide 40

Slide 40 text

Quarkus  Since 3.13, OTel support has been enhanced.  But MicroMeter is now recommended.  Quarkus provides dependencies around OpenTelemetry.  No dependency provided from OpenTelemetry is required.  Several platform specific dependencies are also provided.

Slide 41

Slide 41 text

Enhancement to support OTel in Quarkus 3.13 Quarkus 3.13 - OpenTelemetry Metrics, OpenTelemetry 1.39, TLS registry improvements and more... - Quarkus

Slide 42

Slide 42 text

io.quarkus quarkus-opentelemetry io.opentelemetry.instrumentation opentelemetry-jdbc io.quarkiverse.opentelemetry.exporter quarkus-opentelemetry-exporter-azure Dependencies

Slide 43

Slide 43 text

quarkus.application.name=myservice quarkus.otel.exporter.otlp.endpoint=http://localhost:4317 quarkus.otel.exporter.otlp.headers=authorization=Bearer my_secret quarkus.log.console.format=%d{HH:mm:ss} %-5p traceId=%X{traceId}, parentId=%X{parentId}, spanId=%X{spanId}, sampled=%X{sampled} [%c{2.}] (%t) %s%e%n quarkus.datasource.jdbc.telemetry=true # If using Azure Application Insights, the following properties are required quarkus.otel.azure.applicationinsights.connection.string =InstrumentationKey=00000000-0000-0000-0000-000000000000 Configuration application.properties (application.yml)

Slide 44

Slide 44 text

Demo

Slide 45

Slide 45 text

Environment No intention of promotion, of course! ☺ Database Database for PostgreSQL Flexible Server Observability backend Application Insights Container platform Container Apps

Slide 46

Slide 46 text

Each container Spring Boot Quarkus

Slide 47

Slide 47 text

Collect Telemetry data from polyglot apps

Slide 48

Slide 48 text

Repository  Azure-Samples/java-native-telemetry (github.com)  JDK 17 is used but JDK 21 works as well.  Spring Boot  opentelemetry-spring-boot-starter (2.5.0-alpha used in this repo. The latest is 2.7.0)  opentelemetry-bom (1.38.0 used in this repo. The latest is 1.41.0)  Quarkus  quarkus-bom (3.11.2 used in this repo. The latest is 3.14.2)  quarkus-opentelemetry-exporter-azure (for Azure Application Insights, the latest is 3.8.3.1)

Slide 49

Slide 49 text

Case study

Slide 50

Slide 50 text

Customer case Backgrounds • Microservices on Kubernetes • Spring Boot (Kotlin) • Short-lived and stateless Challenges • Round-trip time got longer as their business grows. • Slow startup and long round-trip time gave negative impact to customer experience. Concerns • Is zero code instrumentation of native applications feasible?

Slide 51

Slide 51 text

Their decisions GraalVM OTel Spring Boot Starter Met their requirements around observability. Confirmed they could add special metrics with codes if needed. Confirmed starter did not spoil the advantage of native application. Round-trip time was improved. Improved startup time of each component. Reduced memory/CPU usage allows them to rearrange resources. Additionally, reduced attack surfaces.

Slide 52

Slide 52 text

Conclusion

Slide 53

Slide 53 text

Takeaways Agentless zero-code instrumentation • Works for GraalVM Native Image applications. • Spring Boot: Spring Boot Starter • Quarkus: enhanced support for OpenTelemetry Customer case study • In several scenarios, Agentless instrumentation fits better • Even if instrumenting typical Java applications. Let’s get involved! • OpenTelemetry project is active. • Your contribution is highly appreciated.

Slide 54

Slide 54 text

Resources OpenTelemetry https://opentelemetry.io/ GraalVM https://graalvm.org/ Spring https://spring.io/ Quarkus https://quarkus.io/

Slide 55

Slide 55 text

Thank you!