Slide 1

Slide 1 text

Marcin Grzejszczak, Jonatán Ivanov Spring One 2021 Tracing Issues in Your Application Copyright © 2021 VMware, Inc. or its affiliates.

Slide 2

Slide 2 text

About Us Marcin Grzejszczak 󰐤 Spring Cloud @ VMware Twitter: @mgrzejszczak Blog: toomuchcoding.com Observability Spring Cloud Sleuth (Distributed Tracing) Spring Observability 😉 Contract Testing Spring Cloud Contract Jonatán Ivanov 󰏘 Spring Cloud @ VMware Twitter: @jonatan_ivanov Blog: develotters.com Observability Spring Cloud Sleuth (Distributed Tracing) Spring Observability 😉 Micrometer (Metrics)

Slide 3

Slide 3 text

Disclaimer This presentation may contain product features or functionality that are currently under development. This overview of new technology represents no commitment from VMware to deliver these features in any generally available product. Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Technical feasibility and market demand will affect final delivery. Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined. The information in this presentation is for informational purposes only and may not be incorporated into any contract. There is no commitment or obligation to deliver any items presented herein.

Slide 4

Slide 4 text

Meet Joe Joe is a demanding internet user Joe wanted to search for memes

Slide 5

Slide 5 text

Meet Jane Jane is a programmer She’s been developing the meme generation system

Slide 6

Slide 6 text

Joe wanted to search for interesting memes. At first, things went smoothly...

Slide 7

Slide 7 text

Great result Joe was happy to be able to find an interesting meme https://www.rd.com/list/dog-memes/

Slide 8

Slide 8 text

Did you just say that you ssh’ed to a production machine?

Slide 9

Slide 9 text

His happiness didn’t last long. Eventually Joe saw this...

Slide 10

Slide 10 text

500 response code Memes failed to generate due to server error

Slide 11

Slide 11 text

Joe was unhappy Joe decided to file a ticket to the meme generation system

Slide 12

Slide 12 text

Jane’s investigating She’s looking at the stack trace. Jane wants to know the truth about what happened

Slide 13

Slide 13 text

2021-02-10 13:57:55.239 ERROR 274441 --- [nio-8082-exec-1] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.IllegalStateException: Meme overflow occurred] with root cause java.lang.IllegalStateException: Meme overflow occurred at io.memegenerator.MemeController.generateMeme(Application.java:93) at io.memegenerator.MemeController$$FastClassBySpringCGLIB$$d54a9db6.invoke() at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749) at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:691) at io.memegenerator.MemeController$$EnhancerBySpringCGLIB$$e33f7fa1.memeOverflow() at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:567) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:197) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:141) at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:808) at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1061) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:961) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898) at javax.servlet.http.HttpServlet.service(HttpServlet.java:626) at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883) at javax.servlet.http.HttpServlet.service(HttpServlet.java:733)

Slide 14

Slide 14 text

That wasn’t too helpful. Jane wanted to know what Joe did that lead to the error...

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Death Star Architecture / Big Ball of Mud

Slide 20

Slide 20 text

Production observability Jane sees that there is no proper feedback from production Logs are missing critical information We’re lacking production observability!

Slide 21

Slide 21 text

What if the apps collect and publish metrics? Is there a way to correlate logs?

Slide 22

Slide 22 text

Jane wants to address the metrics problem

Slide 23

Slide 23 text

Metrics Store

Slide 24

Slide 24 text

What are metrics good for?

Slide 25

Slide 25 text

What about the log correlation?

Slide 26

Slide 26 text

TraceId: 123 2021-02-10 13:57:55.239 … HELLO FROM SERVICE1 123 123 2021-02-10 13:58:10.239 … HELLO FROM SERVICE2 123 123 123 2021-02-10 13:58:20.239 … HELLO FROM SERVICE4 123 2021-02-10 13:58:30.239 … HELLO FROM SERVICE3 123

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Logs Store

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

Jane’s done! The problem was solved thanks to log correlation! Time to start the weekend...

Slide 31

Slide 31 text

Let’s get back to Joe. Joe wanted to generate a new meme...

Slide 32

Slide 32 text

Some meme about cats

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

Joe was sad again The system is slow Another ticket is filed

Slide 35

Slide 35 text

Another issue... Jane sees that Joe filed another issue The description could be better… Jane knows that the logs are correlated - for sure they will help!

Slide 36

Slide 36 text

[6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:05.239 INFO 274441 --- [nio-8082-exec-1] This line shouldn’t be called [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:15.239 INFO 274441 --- [nio-8082-exec-1] Logging sth [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:25.239 INFO 274441 --- [nio-8082-exec-1] Logging moar [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:35.239 INFO 274441 --- [nio-8082-exec-1] This should work [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:45.239 INFO 274441 --- [nio-8082-exec-1] Sending a request [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:55.239 INFO 274441 --- [nio-8082-exec-1] Returning response Sending what request? Getting what response? Did it take 10 seconds?

Slide 37

Slide 37 text

[6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:05.239 INFO 274441 --- [nio-8082-exec-1] This line shouldn’t be called [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:15.239 INFO 274441 --- [nio-8082-exec-1] Logging sth [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:25.239 INFO 274441 --- [nio-8082-exec-1] Logging moar [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:35.239 INFO 274441 --- [nio-8082-exec-1] This should work [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:45.239 INFO 274441 --- [nio-8082-exec-1] Sending a request - start [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:47.239 INFO 274441 --- [nio-8082-exec-1] Sending a request - stop [took 2 seconds] [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:55.239 INFO 274441 --- [nio-8082-exec-1] Returning response Change the logs? Add start? Add stop?

Slide 38

Slide 38 text

That’s a lot of work... Jane sees that modifying all logs will take a lot of time There must be a better way to solve this problem

Slide 39

Slide 39 text

Service 1 Service 3 Service 4 Application Controller Application Service Service 2

Slide 40

Slide 40 text

Service 1 Service 3 Service 4 Application Controller Application Service Service 2 Operation Operation? Operation Operation

Slide 41

Slide 41 text

Service 1 Service 3 Service 4 Application Controller Application Service Service 2 Span Span Span

Slide 42

Slide 42 text

[6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:05.239 INFO 274441 --- [nio-8082-exec-1] This line shouldn’t be called [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:15.239 INFO 274441 --- [nio-8082-exec-1] Logging sth [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:25.239 INFO 274441 --- [nio-8082-exec-1] Logging moar [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:35.239 INFO 274441 --- [nio-8082-exec-1] This should work [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:45.239 INFO 274441 --- [nio-8082-exec-1] Sending a request [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:45.239 INFO 274441 --- [nio-8082-exec-1] [FRAMEWORK] SpanId [123] Start [xx] Stop [yy] [6dad9c7bd47f0e21c946a7a57d75e7c8] 2021-02-10 13:58:55.239 INFO 274441 --- [nio-8082-exec-1] Returning response

Slide 43

Slide 43 text

Span Store

Slide 44

Slide 44 text

Span Whole trace Production dependency graph

Slide 45

Slide 45 text

What are the options? Jane wants to know what distributed tracing options are out there? She has a polyglot environment - does it work for any language / framework? What are the latency visualization tools?

Slide 46

Slide 46 text

Standards OpenZipkin OpenTracing OpenCensus OpenTelemetry

Slide 47

Slide 47 text

Standards OpenZipkin OpenTelemetry

Slide 48

Slide 48 text

OpenZipkin Mature, production tested, multi language support Zipkin 14k stars on GitHub, first release 05.2016 Brave 2k stars on GitHub, first release 04.2013 Supported languages: C#, Go, Java, JavaScript, Ruby, Scala, PHP (many other are community driven)

Slide 49

Slide 49 text

OpenTelemetry (OTel) New, multi language, CNCF oriented OpenTelemetry Java ~900 stars on GitHub, first release 11.2019 OpenTelemetry Spec 1.9k stars on GitHub, first release 06.2019 First GA releases at the end of 2020 and beginning of 2021 OTel wants to address logs, tracing & metrics problems Supported languages: Java, C#, Go, JavaScript, Python, Rust, C++, Erlang/Elixir

Slide 50

Slide 50 text

Latency Visualization Tools VMware Tanzu Observability by Wavefront *Free Tanzu Observability for your Spring Boot applications - no sign-up needed

Slide 51

Slide 51 text

Latency Visualization Tools OpenZipkin Jaeger

Slide 52

Slide 52 text

Latency Visualization Tools APM vendors

Slide 53

Slide 53 text

What about Java & Spring? Jane wants to know if she can use distributed tracing with Java & Spring?

Slide 54

Slide 54 text

Spring Cloud Sleuth

Slide 55

Slide 55 text

Spring Cloud Sleuth Distributed tracing support for Spring Boot applications First release 10.2015 Version 2.x works directly with OpenZipkin Brave’s API Version 3.x bridges from Sleuth API to tracer APIs Plug in the library, configure it, you’re done!

Slide 56

Slide 56 text

Spring Cloud Sleuth - how does it work? Sleuth Instrumentation JAVA PROCESS Sleuth API HTTP Messaging Batch ... Span Store OpenZipkin Brave (out of the box support) OpenTelemetry (incubating) API BRIDGES OpenZipkin, Tanzu Observability (out of the box support) Custom (via OZ Brave / OTel bridges / configuration) REPORTER BRIDGES

Slide 57

Slide 57 text

What about Java & Spring? Jane wants to know if she can collect metrics with Java & Spring?

Slide 58

Slide 58 text

Micrometer

Slide 59

Slide 59 text

Micrometer Vendor-neutral application metrics facade First release 02.2018 Popular Metrics library on the JVM Like SLF4J, but for metrics - simple API (facade/abstraction) Supports the most popular metric backends (no vendor lock-in) Comes with Spring Boot Actuator Spring projects are instrumented using Micrometer Lots of third-party libraries use Micrometer to instrument their code

Slide 60

Slide 60 text

Supports the most popular metric backends AppOptics Atlas Azure Monitor CloudWatch (AWS) Datadog Dynatrace Elastic OpenTSDB Prometheus SignalFx Stackdriver (GCP) StatsD Wavefront* (VMware) (/actuator/metrics) Ganglia Graphite Humio InfluxDB JMX KairosDB New Relic *VMware Tanzu Observability by Wavefront

Slide 61

Slide 61 text

Jane wonders if there’s a Spring-native way of doing observability

Slide 62

Slide 62 text

Spring Observability

Slide 63

Slide 63 text

Spring Observability Spring-native approach to Observability Part of Spring Framework Core Recorder API Instrument once - get metrics and tracing Highly extensible - register additional listeners Tracing abstraction Like Spring Cloud Sleuth but without Spring Cloud Instrumentation happens in dedicated projects

Slide 64

Slide 64 text

Spring Observability - how does it work? JAVA PROCESS Span Store Tracing Listener Metrics Listener RECORDING LISTENERS HTTP Spring Web Instrumentation Tracing API Tracer Implementation Micrometer API MeterRegistry Metrics Store Spring Observability Recorder API

Slide 65

Slide 65 text

IntervalRecording recording = recorder .recordingFor((IntervalEvent) () -> "important-calculation") .tag(Tag.of("calculation-type", "tax", Cardinality.LOW)) .tag(Tag.of("user-id", userId, Cardinality.HIGH)) .start(); try { calculationService.calculate(); } catch (Exception exception) { recording.error(exception); throw exception; } finally { recording.stop(); }

Slide 66

Slide 66 text

IntervalRecording recording = recorder .recordingFor((IntervalEvent) () -> "important-calculation") .tag(Tag.of("calculation-type", "tax", Cardinality.LOW)) .tag(Tag.of("user-id", userId, Cardinality.HIGH)) .start(); try { calculationService.calculate(); } catch (Exception exception) { recording.error(exception); throw exception; } finally { recording.stop(); }

Slide 67

Slide 67 text

IntervalRecording recording = recorder .recordingFor((IntervalEvent) () -> "important-calculation") .tag(Tag.of("calculation-type", "tax", Cardinality.LOW)) .tag(Tag.of("user-id", userId, Cardinality.HIGH)) .start(); try { calculationService.calculate(); } catch (Exception exception) { recording.error(exception); throw exception; } finally { recording.stop(); }

Slide 68

Slide 68 text

IntervalRecording recording = recorder .recordingFor((IntervalEvent) () -> "important-calculation") .tag(Tag.of("calculation-type", "tax", Cardinality.LOW)) .tag(Tag.of("user-id", userId, Cardinality.HIGH)) .start(); try { calculationService.calculate(); } catch (Exception exception) { recording.error(exception); throw exception; } finally { recording.stop(); }

Slide 69

Slide 69 text

Jane wanted to see Spring Observability with Tanzu Observability & log correlation in action!

Slide 70

Slide 70 text

presentation-service

Slide 71

Slide 71 text

DEMO (Thanks to the Tanzu Observability Team for the help!)

Slide 72

Slide 72 text

Questions/Feedback? Contact us on Twitter @mgrzejszczak - toomuchcoding.com @jonatan_ivanov - develotters.com © 2020 Spring. A VMware-backed project.