are currently under development. This overview of new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined. The information in this presentation is for informational purposes only and may not be incorporated into any contract. There is no commitment or obligation to deliver any items presented herein.
do we need it? - “The Three Pillars” (with examples) - Logging - Metrics - Distributed Tracing - How to implement it with Spring? - “Non-conventional” Observability - Q&A
of how well internal states of a system can be inferred from knowledge of its external outputs.” … “A system is said to be observable if [...] the current state can be estimated using only the information from outputs.” (Wikipedia)
knowing ahead what you want to ask Turning data points and context into insights Being able to quickly troubleshoot problems with no prior knowledge (unknown unknowns)
Cloud Environments We need to face unknown unknowns We might not know where our apps are We might not know how many instances we have (or what versions) We can’t modify/debug/etc. it Something is always broken (Fallacies of Distributed Computing) Like sending rovers to Mars: You can’t touch/modify them after launch
You turn a knob here a little and services are going down there Unknown Unknowns We can’t know everything, we need to deal with unknown unknowns “This should be impossible!”, “That will never happen!” Relativity The same thing can be perceived differently by different observers Everything is broken for the users but the server side seems ok
to improve something, you need to be able to measure it first How many resources do you utilize (cpu, ram, io, etc.)? What are your throughput/latency (max.) patterns? How frequently do you deploy? How long does it take for the code to go live? How long does it take to troubleshoot an issue or recover from an outage? How often are you paged?
context? Measure-and-Combine data Aggregatable Can identify trends Not traffic-sensitive (usually) Distributed Tracing Why happened? Recording events With causal ordering Can identify cause across apps Context Propagation (later) Logging What happened? Emitting events Easy to read (grep) INFO/WARN/ERROR/… Stacktraces
140ms.” “The max was 150ms.” So it’s quite bad. But why was this slow? Logging “Processing a request took 140ms.” Is it bad? Is it good? What is the context? Distributed Tracing “Service A called Service B.” “Service B called the DB.” “The services were ok.” “The network was ok.” “The DB was slow.” “Because somebody requested a lot of data.”
2 errors recently.” So it’s not that bad. But why did this happen? Logging “Request processing failed.” “Here’s the stacktrace.” Is it bad? (Well, it failed.) How bad? How many of them failed? What is the context? Distributed Tracing “Service A called Service B.” “Service B called the DB.” “The services were ok.” “The network was ok.” “The DB call failed.” “Because of invalid input.”
and response pairs GC logs: GC events (JEP 271 - Unified GC Logging) Access logs: Logs from the underlying HTTP server (e.g.: Tomcat) - Who and when called our service - What request (HTTP method, headers, path, query) - Response status, processing time, payload sizes etc. (audit logs, metrics in logs, trace logs) Logging 101 - Types of logs
SLF4J - Simple Logging Façade for Java - Simple API for various logging libraries - Allows to plug in the desired logging library Logback - Modern logging library - Natively implements the SLF4J API If you want Log4j2 instead of Logback: - spring-boot-starter-logging + spring-boot-starter-log4j2 Logging with Spring: SLF4J + Logback
logbook-spring-boot-starter (auto-configured) Access logs: server.tomcat.accesslog.enabled=true server.tomcat.basedir=logs server.tomcat.accesslog.pattern=... server.jetty.accesslog.enabled=true server.undertow.accesslog.enabled=true + logback-access (if you want to use Logback, needs to be configured) GC logs: JVM args
Trends, context, anomaly detection, visualization, alerting Various Backends Publishing: Client Pushes vs. Server Polls Dimensionality: Dimensional vs. Hierarchical
Like SLF4J, but for metrics Simple API Supports the most popular metric backends Comes with spring-boot-actuator Spring projects are instrumented using Micrometer A lot of third-party libraries use Micrometer
for Spring Provides an abstraction layer on top of tracing libraries (3.x) - Brave (OpenZipkin), default - OpenTelemetry (CNCF), experimental Log Correlation + Context Propagation Instrumentation for Spring Projects (and your application) Instrumentation for third-party libraries (through Brave and OTel) Supports various backends (through Brave and OTel)
the repo of the project? Cloud instanceId and type image version region, account, cloud provider TLS Certificate Chain subject, issuer validity (expiration date) -> health check? signature algorithm You can create your own endpoint Dependencies used runtime; Dependency lock files /whoami: username + roles
environment)? What versions are deployed (by environment)? Where are they? host/ip, port instanceId, region, account, cloud provider, etc. Service starts/stops (deployments, restarts)?
Docs Generates docs from tests and hand-written docs Spring Cloud Contract + Pact Broker Consumer Driven Contracts (test client-server contract) You know when you break your clients Swagger / OpenAPI + ReDoc API spec, docs API browser + client Spring HATEOAS + HAL Explorer Add links to your resources (other resources or operations) API browser + client