Slide 1

Slide 1 text

Jonatan Ivanov 2022-10-20 Observability Copyright © 2022 VMware, Inc. or its affiliates. Beyond the three pillars with Spring

Slide 2

Slide 2 text

About Me - Spring Team - Micrometer - Spring Cloud Sleuth - “Spring Observability” - Seattle Java User Group - develotters.com - @jonatan_ivanov

Slide 3

Slide 3 text

What is Observability? Why do we need it?

Slide 4

Slide 4 text

What is Observability? How well we can understand the internals of a system based on its outputs (Providing meaningful information about what happens inside)

Slide 5

Slide 5 text

Various Opinions 3 pillars: Logging, Metrics, Distributed Tracing TEMPLE (6 pillars): Tracing, Change Events, Metrics, Profiles, Exceptions Arbitrary Wide Events, Signals Are there other things?

Slide 6

Slide 6 text

Why do we need Observability? Today's systems are insanely complex (cloud) (Death Star Architecture, Big Ball of Mud)

Slide 7

Slide 7 text

Environments can be chaotic (You turn a knob here a little and services are going down there) We need to deal with unknown unknowns (We can’t know everything) Thing can be perceived differently by observers (Everything is broken for the users but seems ok to you) Why do we need Observability?

Slide 8

Slide 8 text

Logging What happened (why)? Emitting events Metrics What is the context? Aggregating data Distributed Tracing Why happened? Recording causal ordering of events Logging - Metrics - Distributed Tracing

Slide 9

Slide 9 text

Logging With Spring

Slide 10

Slide 10 text

SLF4J with Logback comes pre-configured SLF4J (Simple Logging Façade for Java) Simple API for logging libraries Logback Natively implements the SLF4J API If you want Log4j2 instead of Logback: - spring-boot-starter-logging + spring-boot-starter-log4j2 Logging with Spring: SLF4J + Logback

Slide 11

Slide 11 text

Logging with Spring: Payload, Access, GC Payload logs: Logbook + logbook-spring-boot-starter (auto-configured) Access logs: server.tomcat.accesslog.enabled=true server.jetty.accesslog.enabled=true server.undertow.accesslog.enabled=true GC logs: JVM args

Slide 12

Slide 12 text

Metrics With Spring

Slide 13

Slide 13 text

Metrics with Spring: Micrometer Popular Metrics library on the JVM Like SLF4J, but for metrics Simple API Supports the most popular metric backends Comes with spring-boot-actuator Spring projects are instrumented using Micrometer (Boot 2.x) A lot of third-party libraries use Micrometer

Slide 14

Slide 14 text

Like SLF4J, but for metrics … Ganglia Graphite Humio InfluxDB JMX KairosDB New Relic (/actuator/metrics) OpenTSDB OTLP Prometheus SignalFx Stackdriver (GCP) StatsD Wavefront (VMware) AppOptics Atlas Azure Monitor CloudWatch (AWS) Datadog Dynatrace Elastic

Slide 15

Slide 15 text

Distributed Tracing With Spring

Slide 16

Slide 16 text

Distributed Tracing with Spring Boot 2.x: Spring Cloud Sleuth Boot 3.x: Micrometer Tracing (Sleuth w/o Spring dependencies) Provide an abstraction layer on top of tracing libraries - Brave (OpenZipkin), default - OpenTelemetry (CNCF), experimental Instrumentation for Spring Projects, 3rd party libraries, your app Support for various backends

Slide 17

Slide 17 text

Observation API And “Spring Observability”

Slide 18

Slide 18 text

● Add logs (application logs) ● Add metrics ○ Increment Counters ○ Start/Stop Timers ● Add Distributed Tracing ○ Start/Stop Spans ○ Log Correlation ○ Context Propagation You want to instrument your application…

Slide 19

Slide 19 text

Observation API (Micrometer 1.10) Observation observation = Observation.start("talk",registry); try { // TODO: scope Thread.sleep(1000); } catch (Exception exception) { observation.error(exception); throw exception; } finally { // TODO: attach tags (key-value) observation.stop(); }

Slide 20

Slide 20 text

Observation API (Micrometer 1.10) ObservationRegistry registry = ObservationRegistry.create(); registry.observationConfig() .observationHandler(new MeterHandler(...)) .observationHandler(new TracingHandler(...)) .observationHandler(new LoggingHandler(...)) .observationHandler(new AuditEventHandler(...)); Observation observation = Observation.start("talk",registry); // let the fun begin… observation.stop();

Slide 21

Slide 21 text

Observation API (Micrometer 1.10) Observation.createNotStarted("talk",registry) .lowCardinalityKeyValue("conference", "J1") .highCardinalityKeyValue("uid", userId) .observe(this::talk); @Observed

Slide 22

Slide 22 text

“Non-conventional” Observability Is there anything else beyond Logging+Metrics+Tracing? What else could make your app more observable?

Slide 23

Slide 23 text

Spring Boot Actuator auditevents beans caches conditions configprops env flyway health (k8s probes) heap/thread dump httptrace info integrationgraph jolokia logfile loggers liquibase metrics, traces mappings prometheus quartz scheduledtasks sessions shutdown startup

Slide 24

Slide 24 text

Health Endpoint Is my app healthy (k8s probes)? Dependencies? Info Endpoint Build Info (name, version, git commit, build time): Boot 2.x Java Info (JRE/JVM name, version, vendor): Boot 2.6 OS Info (name, arch, version): Boot 2.7 Cloud Info (instanceId, region, account) CPU Cores, Total Memory, GC Info, TLS cert chain Timezone, Current Time, Language, Start Time, Uptime Spring Boot Actuator

Slide 25

Slide 25 text

Service Discoverability, API Discoverability How many service instances do we have? Where? (host/ip, port, instanceId, region, account) What versions are deployed? (by environment) Eureka, Spring Boot Admin How to call/use them? Spring REST Docs Spring Cloud Contract + Pact Broker Swagger / OpenAPI + ReDoc Spring HATEOAS + HAL Explorer

Slide 26

Slide 26 text

Thank you! Follow me on Twitter: @jonatan_ivanov Read my blog: develotters.com Try it out: github.com/jonatan-ivanov/teahouse Learn more: SpringOne December 6-8 © 2022 Spring. A VMware-backed project.

Slide 27

Slide 27 text

Examples Logging Processing took 140ms Metrics P99.999: 140ms Max: 150 ms Distributed Tracing DB was slow (lot of data was requested) Logging Processing failed (stacktrace?) Metrics The error rate is 0.001/sec 2 errors in the last 30 minutes Distributed Tracing DB call failed (invalid input)