Slide 1

Slide 1 text

Profiling your Java Applica2on Slides posted à @victorrentea ♦ VictorRentea.ro Code: github.com/victorrentea/performance-profiling.git /

Slide 2

Slide 2 text

👉 victorrentea.ro/training-offer 👋 Hi, I'm Victor Rentea 🇷🇴 PhD(CS) Java Champion, 17 years of code, code, code, code, code.... Consultant & Trainer for 100+ EU companies: ❤ Clean Code, Architecture, Unit Tes3ng 🛠 Spring, Hibernate, Reac3ve ⚡ Java Performance, Secure Coding 🔐 Conference Speaker - many recorded talks on YouTube Founder of European SoJware CraJers (5900+ members) 🔥 Free monthly webinars, 1 hour a>er work. Join: victorrentea.ro/community Past events on youtube.com/vrentea Father of 👧👦, servant of a 🐈, weekend gardener 🌼 @victorrentea VictorRentea.ro

Slide 3

Slide 3 text

3 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community The response time of your endpoint is too slow 😱 🎉 Congratula6ons! 🎉 You made it to produc.on, and you're taking serious load! Where to start?

Slide 4

Slide 4 text

4 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Log timestamps log.info("Before call"); // 2023-05-23 00:01:50.000 ... // suspect code log.info("After call"); // 2023-05-23 00:01:53.100 ... Measuring Method Runtime – Old-School Style t1-t0 long t0 = currentTimeMillis(); List comments = client.fetchComments(loanId); !// suspect code long t1 = currentTimeMillis(); System.out.println("Took millis: " + (t1 - t0));

Slide 5

Slide 5 text

5 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community Micrometer @Timed // wraps the method with an aspect that monitors spent time List fetchComments(Long id) { // suspect code } then, in a Spring application GET /actuator/prometheus returns: method_timed_seconds_sum{method="fetchComments",} 100.0 method_timed_seconds_count{method="fetchComments",} 400 è avg = 0.25 s/call sampled again after 5 seconds: method_timed_seconds_sum{method="fetchComments",} 150.0 – 100 = 50 s method_timed_seconds_count{method="fetchComments",} 500 – 400 = 100 calls è avg = 50 s / 100 = 0.5 s/call (spike) !me average[ms] 250 500

Slide 6

Slide 6 text

6 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Micrometer API h%ps://www.baeldung.com/micrometer // programatic alternative to @Timed meterRegistry.timer("name").record(() -> { /*suspect code*/ }); meterRegistry.timer("name").record(t1 - t0, MILLISECONDS); // accumulator meterRegistry.counter("earnings").increment(purchase); // current value meterRegistry.gauge("player-count", currentPlayers.size()); // eg. "95% requests took ≤ 100 ms" @Timed(percentiles = {0.95}, histogram = true)

Slide 7

Slide 7 text

7 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community §Micrometer automa,cally introspects framework internals - Exposing metrics to help detect typical issues - You can enable more: Tomcat metrics, decorate executors, ... §This data is regularly polled/pushed to other tools - eg. Prometheus retrieves and stores this data every 5 seconds §Tools (eg.) Grafana can later use this data to: - Display impressive charts - Apply custom aggregaJon formulas - Raise alerts - ... Some examples è Observing a System with Micrometer

Slide 8

Slide 8 text

8 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community § hMp_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04 hMp_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0 § tomcat_connecJons_current_connecJons{...} 298 ⚠ 98 requests waiJng in queue tomcat_threads_busy_threads{...} 200 ⚠ = using 100% of the threads => thread pool starvaJon? § jvm_gc_memory_allocated_bytes_total 1.8874368E9 jvm_memory_usage_a>er_gc_percent{...} 45.15 ⚠ >30% => increase max heap? § hikaricp_connecJons_acquire_seconds_sum{pool="HikariPool-1",} 80.37 ⚠ conn pool starvaJon? hikaricp_connecJons_acquire_seconds_count{pool="HikariPool-1",} 551.0 § cache_gets_total{cache="cache1",result="miss",} 0.99 ⚠ 99% miss=cache not used. config issue? cache_gets_total{cache="cache1",result="hit",} 0.01 § logback_events_total{level="error",} 1255.0 ⚠ + 1000 than 5 seconds ago => 😱 § my_own_custom_metric xyz Some Metrics available at /actuator/prometheus

Slide 9

Slide 9 text

9 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community ? Monitoring a single system is easy!

Slide 10

Slide 10 text

10 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Compared to a Distributed Ecosystem 1st challenge: trace requests A B C D E Monitoring a single system is easy!

Slide 11

Slide 11 text

11 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community §TraceID header is read from the incoming HTTP request/MQ message - If missing, it is generated by first system in the call chain §TraceID is propagated inside the app: - Methods called (via ThreadLocal) - Any logs printed (via Logback MDC) – criJcal for distributed log analysis - Upstream operators in a Reactor chain (via ReactorContext in WebFlux) - Async methods submiMed to a DI-managed *TaskExecutor (via a TaskDecorator) §TraceID header is sent in any outgoing request/message - ... sent via a DI-injected bean: @Autowired RestTemplate, RabbitTemplate, ... Distributed Request Tracing h%ps://www.baeldung.com/spring-cloud-sleuth-single-applica TraceId

Slide 12

Slide 12 text

12 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community §The TraceId is available for any processing we do §Every system reports the span of ,me spent working on a TraceId - For performance, a fracJon of spans are reported (eg 1%) §... to a central system that builds a ,me chart of all systems example è Distributed Request Sampling

Slide 13

Slide 13 text

13 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community Distributed Request Sampling with Zipkin

Slide 14

Slide 14 text

14 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community Distributed Request Sampling with Zipkin

Slide 15

Slide 15 text

15 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community §Repeated API calls, by accident §API calls in a loop, that could be fetched in batches - GET /item/1, GET /item/3, .... §Long chains of REST calls - A à B à C à D à E §System-in-the-middle §Independent API calls than can be parallelized §Long spans not waiBng for other systems => zoom in 🔎 Distributed Processing An6-PaKerns ⚠

Slide 16

Slide 16 text

16 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Profiling Blackbox: inter-system communication (Zipkin, ...) Graybox: report key framework metrics (Micrometer) Whitebox: profile methods inside JVM (Java Flight Recorderè) app lib-a lib-b methodA() methodB()

Slide 17

Slide 17 text

17 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community How to find methods that take a long time? 💡 Naive idea: Instrument your code with a -javaagent and measure the execution of every single executed Java method impractical because of the added CPU+MEMORY overhead

Slide 18

Slide 18 text

18 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Java Flight Recorder The Tools of the Java Gods

Slide 19

Slide 19 text

19 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Java Flight Recorder (JFR) Main features: §Regularly capture stack traces of all running threads every second §Record Internal JVM events about locks, files, sockets, GC, ... or your own §Heavily op;mized to extra-low overhead (<2%), usable in prod, for free since Java 11 Usage: §Local JVM => Start/stop from Java Mission Control or IntelliJ ⬆ §Remote JVM => generate a .jfr file => download => analyze First, enable JFR via -XX:+FlightRecorder, then start the recording: - Con!nuous from startup: -XX:StartFlightRecording=.... - Via command line: jcmd JFR.start - Via JMX - Via tools, eg Glowroot

Slide 20

Slide 20 text

26 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Flame Graph best way to visualize the output of an execution profiler

Slide 21

Slide 21 text

27 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) JFR samples the stack traces of running threads every second

Slide 22

Slide 22 text

28 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) Length of each bar is propor?onal to the number of samples (?me) Building a Flame Graph JFR samples the stack traces of running threads every second Flame.dummy(Flame.java:14)

Slide 23

Slide 23 text

29 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Data is merged by line number 2 samples (33%) 4 samples (66%) Length of each bar is proportional to the number of samples (time) Building a Flame Graph JFR samples the stack traces of running threads every second Flame.dummy(Flame.java:14)

Slide 24

Slide 24 text

30 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) IN REAL LIFE Unsafe.park(Native Method) ......50 more lines↕...... SocketInputStream.socketRead0(Native Method) ......40 more lines↕...... 423 210 After recording 1 minute of heavy parallel load: .... 100 lines of Aspects, Spring, Tomcat ... Data is merged by line number Length of each bar is proportional to the number of samples (time) Building a Flame Graph JFR samples the stack traces of running threads every second BLOCK NETWORK

Slide 25

Slide 25 text

31 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community ProfiledApp Spring Boot 2.7 http://localhost:8080 Load Tests Java app (Gatling.io) 23 parallel requests (closed world model) PostgreSQL in a Local Docker localhost:5432 WireMock in a Local Docker http://localhost:9999 remote API responses with 40ms delay Glowroot.org -javaagent:glowroot.jar 👉 http://localhost:4000 Experiment Environment ToxiProxy github.com/Shopify/toxiproxy installed locally localhost:55432 +3ms network latency JFR

Slide 26

Slide 26 text

32 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community EXPERIMENT

Slide 27

Slide 27 text

33 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Glowroot.org §Extremely simple application profiler - Add VM arg: -javaagent:/path/to/glowroot.jar - Open http://localhost:4000 §Monitors - Methods Profiling, displayed as Flame Graphs - SQLs - API calls - Typical bottlenecks - Data per endpoint §Free - Probably not production-ready, more for learning/debugging

Slide 28

Slide 28 text

35 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Split Points ß 85% of the query time is spent acquiring a JDBC Connection à (aka connection starvation issue) Actual waiting for SQL server Connection Pool Starvation

Slide 29

Slide 29 text

36 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community False Split due to Just-In-Time Compiler (JIT) Optimization The Same Method is called via 2 paths !! Because JIT optimized a reflection call (1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor Fix: Warmup JVM: record in prod, or re-run load test

Slide 30

Slide 30 text

37 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Time-consuming Aspects By default, TransactionInterceptor (@Transactional) acquires a connection before entering the annotated method :382 :388

Slide 31

Slide 31 text

38 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Lock Contention Automatic analysis by Java Mission Control of the recorded .jfr file: https://wiki.openjdk.org/display/jmc/Releases ß Waiting to enter a synchronized method à (native C code is not recorded by JFR)

Slide 32

Slide 32 text

39 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community HashSet hashSet = ...; // .size() = N (large) hashSet.remove(13); // takes constant time O(1) List list = ...; // .size() = M hashSet.removeAll(list); // takes M x O(hashSet.remove()) = O(M) x O(1) = O(M) expected time < 100 ms è when M > N => N * list.contains(e) = N * M: Unknown Library Details Actual = 30k x 29k iterations!! in Production 1800 ms!! 😱 😱 😱 😱

Slide 33

Slide 33 text

41 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community JFR recording of ❤Virtual Threads❤ in a Spring Boot 3 app running on Java 19/20

Slide 34

Slide 34 text

42 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community JFR Profiling – Lessons Learned ✅ "Sees" inside unknown code / library ❌ Can't see into naBve C code ⚠ 1 sample/second è requires load tests or ran in prod ✅ Can be used in produc/on (x-low overhead) ⚠ May report wrong line numbers in JIT-opBmized code (CPU+++) ⚠ Tuning requires paBence, judgment, and library awareness ❌ Can't follow async calls/reacBve chains (change threads) è Structured Concurrency[JEP-428 in Java19+] makes forks explicit è hierarchical threads ✅ Can record Virtual Threads[JEP-444 in Java 19+]

Slide 35

Slide 35 text

44 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community Conclusions §Zoom in on the performance issue via distributed tracing (Zipkin) §Study default metrics collected by Micrometer & expose more §If bo^leneck is sBll unclear, or you can't use above tools ⬆, use the JFR profiler in prod or under load tests - Also: Load the .jfr recording in JMC for automa8c analysis §OpBmize code & configuraBon, measuring the benefit vs. risk/pain

Slide 36

Slide 36 text

Profiling your Java Application victor.rentea@gmail.com ♦ ♦ @victorrentea ♦ VictorRentea.ro Find the code at: https://github.com/victorrentea/performance-profiling.git Join us! victorrentea.ro/community