Slide 1

Slide 1 text

Profiling your Java Applica2on [email protected] ♦ ♦ @victorrentea ♦ VictorRentea.ro Code: https://github.com/victorrentea/performance-profiling.git Branch: DevDays_2023_05

Slide 2

Slide 2 text

👉 victorrentea.ro/training-offer 👋 Hi, I'm Victor Rentea 🇷🇴 PhD(CS) Java Champion, 17 years of code, code, code, code, code.... Consultant & Trainer: 5000 developers of 100+ EU companies: ❤ Clean Code, Architecture, Unit Tes3ng 🛠 Java Frameworks: Spring, Hibernate, WebFlux ⚡ Java Performance, Secure Coding 🔐 Conference Speaker – many educaGve talks on YouTube Founder of European SoMware CraMers (5800+ members) 🔥 Free monthly webinars, 1 hour a@er work 👉 victorrentea.ro/community Past events on my channel: youtube.com/vrentea Father of 👧👦, host of a 🐈, weekend gardener 🌼 @victorrentea h,ps://VictorRentea.ro

Slide 3

Slide 3 text

3 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community The response time of your endpoint is too slow 😱 🎉 Congratulations! 🎉 You made it to production, and you're taking serious load! Where to start?

Slide 4

Slide 4 text

4 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Log timestamps log.info("Before call"); !// 2023-05-23 00:01:50.000 !!... !// suspect code log.info("After call"); !// 2023-05-23 00:01:53.100 !!... Measuring Method Runtime - Traditional Style t1-t0 long t0 = currentTimeMillis(); List comments = client.fetchComments(loanId); // suspect code long t1 = currentTimeMillis(); System.out.println("Took millis: " + (t1 - t0));

Slide 5

Slide 5 text

5 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Micrometer @Timed !// an aspect wraps this method monitoring spent time List fetchComments(Long id) { !!==> see code !// suspect code } then, GET /actuator/prometheus returns: method_timed_seconds_sum{method="fetchComments",} 100.0 method_timed_seconds_count{method="fetchComments",} 400.0 è avg = 250 ms/call sampled again after 5 seconds: method_timed_seconds_sum{method="fetchComments",} 150.0 method_timed_seconds_count{method="fetchComments",} 500.0 è avg = 50s / 100 = 500 ms/call (spike) time average[ms] 250 500

Slide 6

Slide 6 text

6 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Other ways to report metrics https://www.baeldung.com/micrometer // programatic alternative to aspects meterRegistry.timer("name").record(() -> { /*suspect code*/ }); meterRegistry.timer("name").record(t1 - t0, MILLISECONDS); // accumulator meterRegistry.counter("earnings").increment(purchaseDollars); // instant value meterRegistry.gauge("player-count", currentPlayers.size()); // eg. "95% took ≤ 100 ms" @Timed(percentiles = {0.95}, histogram = true)

Slide 7

Slide 7 text

7 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community §Micrometer automatically introspects framework internals - Exposing metrics for typical issues - You can enable more: Tomcat metrics, decorate executors, ... §This data is regularly polled/pushed to other tools - eg every 5s, Prometheus retrieves and stores this data §Using that data, (eg) Grafana can: - Display impressive charts - Apply custom aggregation formulas - Raise alerts to Ops - ... Some examples è Observing a System with Micrometer

Slide 8

Slide 8 text

8 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community § http_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04 http_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0 § tomcat_connections_current_connections{...} 298 //<-98 requests waiting tomcat_threads_busy_threads{...} 200 // = default max size => worker thread pool starvation? § jvm_gc_memory_allocated_bytes_total 1.8874368E9 jvm_memory_usage_after_gc_percent{...} 45.15 // >30% => increase max heap? § hikaricp_connections_acquire_seconds_sum{pool="HikariPool-1",} 80.37 // conn pool starvation? hikaricp_connections_acquire_seconds_count{pool="HikariPool-1",} 551.0 § cache_gets_total{cache="cache1",result="miss",} 0.99 // 99% missè useless cache or config issue cache_gets_total{cache="cache1",result="hit",} 0.01 § logback_events_total{level="error",} 1255.0 // + 1000 than 5 seconds ago => 😱 § my_own_metric xyz Example Micrometer Metrics available at /actuator/prometheus

Slide 9

Slide 9 text

9 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community ? Monitoring a single system is easy!

Slide 10

Slide 10 text

10 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Monitoring Microservices is Harder! Performance problems are usually Distributed J 1st challenge: trace requests A B C D E

Slide 11

Slide 11 text

11 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community §TraceID is read from the incoming HTTP request/MQ message, or generated §TraceID is propagated to: - Method calls (via ThreadLocal) - Upstream operator in a reacDve chain (via ReactorContext in WebFlux) - Async methods submiLed to a Spring-managed *TaskExecutor §TraceID is sent as a header in any outgoing HTTP request / MQ message - ... sent via a Spring-managed bean: @Autowired RestTemplate, RabbitTemplate, ... §Every log line prints the TraceID (from MDC on thread) 👏 Distributed Request Tracing (Sleuth) https://www.baeldung.com/spring-cloud-sleuth-single-application A B TraceId

Slide 12

Slide 12 text

12 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community §The TraceId is available for any processing we do §Every system reports the time spent working on a TraceId (= span) - (1% of spans are reported, for performance) §... to a central system (eg Zipkin) that builds a time chart of all systems example è Distributed Request Sampling

Slide 13

Slide 13 text

13 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Distributed Request Tracing with Zipkin

Slide 14

Slide 14 text

14 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community Distributed Request Tracing with Zipkin

Slide 15

Slide 15 text

15 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community §The TraceId is available for any processing we do §Every system reports the Gme spent working on a TraceId (= span) - (1% of spans are reported, for performance) §... to a central system (eg Zipkin) that builds a Gme chart of all systems ⚠ Look out for anA-paCerns like: §Repeated API calls => avoid repeated calls / fetch in batches §Long chains of REST calls => avoid system-in-the-middle §Independent calls => parallelize §Long spans not waiGng for other systems => zoom in 🔎 Distributed Request Sampling

Slide 16

Slide 16 text

16 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community Profiling Blackbox: inter-system communication (Zipkin, ...) Graybox: report key framework metrics (Micrometer) Whitebox: profile methods inside JVM (Java Flight Recorderè)

Slide 17

Slide 17 text

17 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community How to find methods that take a long Dme? 💡 Idea: Instrument your code with a -javaagent and measure the execuDon of every executed Java method Naive idea: the added overhead makes this approach impracDcal

Slide 18

Slide 18 text

18 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community Java Flight Recorder The Tools of the Java Gods

Slide 19

Slide 19 text

19 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Java Flight Recorder (JFR) Main features: §Regularly capture stack traces of all running threads every second §Record Internal JVM events about locks, files, sockets, GC, ... or your own §Heavily op9mized to extra-low overhead (<2%), usable in prod, for free since Java 11 Usage: §Local JVM => Start/stop from Java Mission Control (JMC) or IntelliJ ⬆ §Remote JVM => generate a .jfr file => download => analyze with JMC or IntelliJ First, enable JFR via -XX:+FlightRecorder, then start the recording: 1) From startup: -XX:StartFlightRecording=.... 2) Via command line: jcmd JFR.start 3) Via JMX 4) Via tools, eg Glowroot

Slide 20

Slide 20 text

26 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community The JFR Profiler Output viewed as a Flame Graph

Slide 21

Slide 21 text

27 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) JFR records every second the stack traces of all running threads

Slide 22

Slide 22 text

28 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) Length of a bar = number of samples, propor4onal to the 4me spent on that path JFR records every second the stack traces of all running threads Flame Graphs - Explained

Slide 23

Slide 23 text

29 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Length of a bar = number of samples, propor4onal to the 4me spent on that path JFR records every second the stack traces of all running threads Data is aggregated by line number (merged) Flame Graphs - Explained 2 samples (33%) 4 samples (66%)

Slide 24

Slide 24 text

30 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Length of a bar = number of samples, propor4onal to the 4me spent on that path JFR records every second the stack traces of all running threads Data is aggregated by line number (merged) Flame Graphs - Explained 2 samples (33%) 4 samples (66%) IN PRACTICE Unsafe.park(Native Method) ......50 more lines↕...... SocketInputStream.socketRead0(Native Method) ......40 more lines↕...... 423 210 After recording 1 minute of heavy parallel load: .... 100 lines of Aspects, Spring, Tomcat ...

Slide 25

Slide 25 text

31 VictorRentea.ro Join the European So;ware Cra;ers community at victorrentea.ro/community ProfiledApp Spring Boot 2.7 http://localhost:8080 Load Tests Java app (Gatling.io) 23 parallel requests (closed world) PostgreSQL in a Local Docker localhost:5432 WireMock in a Local Docker http://localhost:9999 remote API responses with 40ms delay Glowroot.org -javaagent:glowroot.jar 👉 http://localhost:4000 Experiment Environment ToxiProxy installed locally localhost:55432 +3ms network latency JFR

Slide 26

Slide 26 text

32 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community EXPERIMENT

Slide 27

Slide 27 text

33 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Glowroot.org §Extremely simple applica Flame Graphs - SQLs - API calls - Typical boelenecks - Data per endpoint §Free - Not sure if produckon-ready, more for learning/debugging

Slide 28

Slide 28 text

37 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Split Points ß 85% of the query time is spent acquiring a JDBC Connection à (aka connection starvation issue) Actual waiDng for SQL server Connection Pool Starvation

Slide 29

Slide 29 text

38 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community False Split due to Just-In-Time Compiler (JIT) Op>miza>on Same method is called via 2 paths !! Because JIT optimized (1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor for calling a method using reflection Fix: Warmup JVM (eg. re-run load test)

Slide 30

Slide 30 text

39 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Time-consuming Aspects By default, TransacDonInterceptor (@TransacDonal) acquires a connecDon before entering the annotated method :382 :388

Slide 31

Slide 31 text

40 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Lock ContenSon AutomaDc analysis by Java Mission Control of the recorded .jfr file: hAps://wiki.openjdk.org/display/jmc/Releases ß WaiDng to enter a synchronized method à (naEve C code is not recorded by JFR)

Slide 32

Slide 32 text

41 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community HashSet hashSet = ...; // .size() = N hashSet.remove(13); // = O(1) time = constant time List list = ...; // .size() = M hashSet.removeAll(list); // expected = O(M) x O(hashSet.remove()) = O(M) x O(1) I expected < 100 ms è if M > N => N * list.contains() = N * M: Unknown Library Details Actual = 30k x 29k iterations!! in Production 1800 ms!! 😱 😱 😱 😱

Slide 33

Slide 33 text

43 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community JFR recording of ❤Virtual Threads❤ in a Spring Boot 3 app running on Java 19/20

Slide 34

Slide 34 text

44 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community JFR Profiling – Lessons Learned ✅ Can "see" inside unknown code / library ❌ Can't see into na>ve C code ⚠ 1 sample/second è requires load from tests or from prod ✅ Can be used in produc/on (low overhead) ⚠ May report wrong line numbers in JIT-op>mized code (CPU+++) ⚠ Tuning requires pa>ence, judgment, and library awareness ❌ Cannot follow async calls/reac>ve chains (swapping threads) è Structured Concurrency[JEP-428 in Java19+] makes forks explicit: hierarchical threads ✅ Can record Virtual Threads[JEP-444 in Java 19+]

Slide 35

Slide 35 text

46 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Conclusions §Zoom in on the issue via distributed tracing (Zipkin) §Study default metrics collected by Micrometer & expose more §If boWleneck is s>ll unclear, or you cannot use above tools ⬆, use the JFR profiler in prod or under load tests - Also: analyze the .jfr recording using JMC §Op>mize code & configura>on, measuring the benefit vs. risk/pain

Slide 36

Slide 36 text

Profiling your Java Application [email protected] ♦ ♦ @victorrentea ♦ VictorRentea.ro Find the code at: https://github.com/victorrentea/performance-profiling.git Join us! victorrentea.ro/community