Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Profiling a Java Application @DevBcn 2023

Profiling a Java Application @DevBcn 2023

Professional performance optimization always starts with tracing the root cause of the bottleneck, to make sure we fix the correct problem. Trying to optimize the program by ‘guessing’ what can cause the performance issue is known as Shotgun Tuning, and typically results in messy code, wasted effort, and lots of frustration. In this talk, we’ll explore the best 3 free tools for profiling a Java application: Micrometer, Glowroot, and Mission Control and we’ll discover the power of a profiler built-in already into your JVM – the Java Flight Recorder. Using these tools, we'll trace typical issues like connection/thread starvation, long-running methods, and more.

Victor Rentea

July 03, 2023
Tweet

More Decks by Victor Rentea

Other Decks in Technology

Transcript

  1. Profiling your Java Applica2on Slides posted à @victorrentea ♦ VictorRentea.ro

    Code: github.com/victorrentea/performance-profiling.git /
  2. 👉 victorrentea.ro/training-offer 👋 Hi, I'm Victor Rentea 🇷🇴 PhD(CS) Java

    Champion, 17 years of code, code, code, code, code.... Consultant & Trainer for 100+ EU companies: ❤ Clean Code, Architecture, Unit Tes3ng 🛠 Spring, Hibernate, Reac3ve ⚡ Java Performance, Secure Coding 🔐 Conference Speaker - many recorded talks on YouTube Founder of European SoJware CraJers (5900+ members) 🔥 Free monthly webinars, 1 hour a>er work. Join: victorrentea.ro/community Past events on youtube.com/vrentea Father of 👧👦, servant of a 🐈, weekend gardener 🌼 @victorrentea VictorRentea.ro
  3. 3 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community

    The response time of your endpoint is too slow 😱 🎉 Congratula6ons! 🎉 You made it to produc.on, and you're taking serious load! Where to start?
  4. 4 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Log timestamps log.info("Before call"); // 2023-05-23 00:01:50.000 ... // suspect code log.info("After call"); // 2023-05-23 00:01:53.100 ... Measuring Method Runtime – Old-School Style t1-t0 long t0 = currentTimeMillis(); List<CommentDto> comments = client.fetchComments(loanId); !// suspect code long t1 = currentTimeMillis(); System.out.println("Took millis: " + (t1 - t0));
  5. 5 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community

    Micrometer @Timed // wraps the method with an aspect that monitors spent time List<CommentDto> fetchComments(Long id) { // suspect code } then, in a Spring application GET /actuator/prometheus returns: method_timed_seconds_sum{method="fetchComments",} 100.0 method_timed_seconds_count{method="fetchComments",} 400 è avg = 0.25 s/call sampled again after 5 seconds: method_timed_seconds_sum{method="fetchComments",} 150.0 – 100 = 50 s method_timed_seconds_count{method="fetchComments",} 500 – 400 = 100 calls è avg = 50 s / 100 = 0.5 s/call (spike) !me average[ms] 250 500
  6. 6 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Micrometer API h%ps://www.baeldung.com/micrometer // programatic alternative to @Timed meterRegistry.timer("name").record(() -> { /*suspect code*/ }); meterRegistry.timer("name").record(t1 - t0, MILLISECONDS); // accumulator meterRegistry.counter("earnings").increment(purchase); // current value meterRegistry.gauge("player-count", currentPlayers.size()); // eg. "95% requests took ≤ 100 ms" @Timed(percentiles = {0.95}, histogram = true)
  7. 7 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community

    §Micrometer automa,cally introspects framework internals - Exposing metrics to help detect typical issues - You can enable more: Tomcat metrics, decorate executors, ... §This data is regularly polled/pushed to other tools - eg. Prometheus retrieves and stores this data every 5 seconds §Tools (eg.) Grafana can later use this data to: - Display impressive charts - Apply custom aggregaJon formulas - Raise alerts - ... Some examples è Observing a System with Micrometer
  8. 8 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    § hMp_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04 hMp_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0 § tomcat_connecJons_current_connecJons{...} 298 ⚠ 98 requests waiJng in queue tomcat_threads_busy_threads{...} 200 ⚠ = using 100% of the threads => thread pool starvaJon? § jvm_gc_memory_allocated_bytes_total 1.8874368E9 jvm_memory_usage_a>er_gc_percent{...} 45.15 ⚠ >30% => increase max heap? § hikaricp_connecJons_acquire_seconds_sum{pool="HikariPool-1",} 80.37 ⚠ conn pool starvaJon? hikaricp_connecJons_acquire_seconds_count{pool="HikariPool-1",} 551.0 § cache_gets_total{cache="cache1",result="miss",} 0.99 ⚠ 99% miss=cache not used. config issue? cache_gets_total{cache="cache1",result="hit",} 0.01 § logback_events_total{level="error",} 1255.0 ⚠ + 1000 than 5 seconds ago => 😱 § my_own_custom_metric xyz Some Metrics available at /actuator/prometheus
  9. 10 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Compared to a Distributed Ecosystem 1st challenge: trace requests A B C D E Monitoring a single system is easy!
  10. 11 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community

    §TraceID header is read from the incoming HTTP request/MQ message - If missing, it is generated by first system in the call chain §TraceID is propagated inside the app: - Methods called (via ThreadLocal) - Any logs printed (via Logback MDC) – criJcal for distributed log analysis - Upstream operators in a Reactor chain (via ReactorContext in WebFlux) - Async methods submiMed to a DI-managed *TaskExecutor (via a TaskDecorator) §TraceID header is sent in any outgoing request/message - ... sent via a DI-injected bean: @Autowired RestTemplate, RabbitTemplate, ... Distributed Request Tracing h%ps://www.baeldung.com/spring-cloud-sleuth-single-applica<on A B <header> TraceId
  11. 12 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community

    §The TraceId is available for any processing we do §Every system reports the span of ,me spent working on a TraceId - For performance, a fracJon of spans are reported (eg 1%) §... to a central system that builds a ,me chart of all systems example è Distributed Request Sampling
  12. 15 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community

    §Repeated API calls, by accident §API calls in a loop, that could be fetched in batches - GET /item/1, GET /item/3, .... §Long chains of REST calls - A à B à C à D à E §System-in-the-middle §Independent API calls than can be parallelized §Long spans not waiBng for other systems => zoom in 🔎 Distributed Processing An6-PaKerns ⚠
  13. 16 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Profiling Blackbox: inter-system communication (Zipkin, ...) Graybox: report key framework metrics (Micrometer) Whitebox: profile methods inside JVM (Java Flight Recorderè) app lib-a lib-b methodA() methodB()
  14. 17 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    How to find methods that take a long time? 💡 Naive idea: Instrument your code with a -javaagent and measure the execution of every single executed Java method impractical because of the added CPU+MEMORY overhead
  15. 19 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Java Flight Recorder (JFR) Main features: §Regularly capture stack traces of all running threads every second §Record Internal JVM events about locks, files, sockets, GC, ... or your own §Heavily op;mized to extra-low overhead (<2%), usable in prod, for free since Java 11 Usage: §Local JVM => Start/stop from Java Mission Control or IntelliJ ⬆ §Remote JVM => generate a .jfr file => download => analyze First, enable JFR via -XX:+FlightRecorder, then start the recording: - Con!nuous from startup: -XX:StartFlightRecording=.... - Via command line: jcmd <PID> JFR.start - Via JMX - Via tools, eg Glowroot
  16. 26 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Flame Graph best way to visualize the output of an execution profiler
  17. 27 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) JFR samples the stack traces of running threads every second
  18. 28 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) Length of each bar is propor?onal to the number of samples (?me) Building a Flame Graph JFR samples the stack traces of running threads every second Flame.dummy(Flame.java:14)
  19. 29 VictorRentea.ro Join the European So3ware Cra3ers community at victorrentea.ro/community

    Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Data is merged by line number 2 samples (33%) 4 samples (66%) Length of each bar is proportional to the number of samples (time) Building a Flame Graph JFR samples the stack traces of running threads every second Flame.dummy(Flame.java:14)
  20. 30 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) IN REAL LIFE Unsafe.park(Native Method) ......50 more lines↕...... SocketInputStream.socketRead0(Native Method) ......40 more lines↕...... 423 210 After recording 1 minute of heavy parallel load: .... 100 lines of Aspects, Spring, Tomcat ... Data is merged by line number Length of each bar is proportional to the number of samples (time) Building a Flame Graph JFR samples the stack traces of running threads every second BLOCK NETWORK
  21. 31 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    ProfiledApp Spring Boot 2.7 http://localhost:8080 Load Tests Java app (Gatling.io) 23 parallel requests (closed world model) PostgreSQL in a Local Docker localhost:5432 WireMock in a Local Docker http://localhost:9999 remote API responses with 40ms delay Glowroot.org -javaagent:glowroot.jar 👉 http://localhost:4000 Experiment Environment ToxiProxy github.com/Shopify/toxiproxy installed locally localhost:55432 +3ms network latency JFR
  22. 33 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Glowroot.org §Extremely simple application profiler - Add VM arg: -javaagent:/path/to/glowroot.jar - Open http://localhost:4000 §Monitors - Methods Profiling, displayed as Flame Graphs - SQLs - API calls - Typical bottlenecks - Data per endpoint §Free - Probably not production-ready, more for learning/debugging
  23. 35 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Split Points ß 85% of the query time is spent acquiring a JDBC Connection à (aka connection starvation issue) Actual waiting for SQL server Connection Pool Starvation
  24. 36 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    False Split due to Just-In-Time Compiler (JIT) Optimization The Same Method is called via 2 paths !! Because JIT optimized a reflection call (1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor Fix: Warmup JVM: record in prod, or re-run load test
  25. 37 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Time-consuming Aspects By default, TransactionInterceptor (@Transactional) acquires a connection before entering the annotated method :382 :388
  26. 38 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Lock Contention Automatic analysis by Java Mission Control of the recorded .jfr file: https://wiki.openjdk.org/display/jmc/Releases ß Waiting to enter a synchronized method à (native C code is not recorded by JFR)
  27. 39 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    HashSet<Long> hashSet = ...; // .size() = N (large) hashSet.remove(13); // takes constant time O(1) List<Long> list = ...; // .size() = M hashSet.removeAll(list); // takes M x O(hashSet.remove()) = O(M) x O(1) = O(M) expected time < 100 ms è when M > N => N * list.contains(e) = N * M: Unknown Library Details Actual = 30k x 29k iterations!! in Production 1800 ms!! 😱 😱 😱 😱
  28. 41 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    JFR recording of ❤Virtual Threads❤ in a Spring Boot 3 app running on Java 19/20
  29. 42 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    JFR Profiling – Lessons Learned ✅ "Sees" inside unknown code / library ❌ Can't see into naBve C code ⚠ 1 sample/second è requires load tests or ran in prod ✅ Can be used in produc/on (x-low overhead) ⚠ May report wrong line numbers in JIT-opBmized code (CPU+++) ⚠ Tuning requires paBence, judgment, and library awareness ❌ Can't follow async calls/reacBve chains (change threads) è Structured Concurrency[JEP-428 in Java19+] makes forks explicit è hierarchical threads ✅ Can record Virtual Threads[JEP-444 in Java 19+]
  30. 44 VictorRentea.ro Join the European Software Crafters community at victorrentea.ro/community

    Conclusions §Zoom in on the performance issue via distributed tracing (Zipkin) §Study default metrics collected by Micrometer & expose more §If bo^leneck is sBll unclear, or you can't use above tools ⬆, use the JFR profiler in prod or under load tests - Also: Load the .jfr recording in JMC for automa8c analysis §OpBmize code & configuraBon, measuring the benefit vs. risk/pain
  31. Profiling your Java Application [email protected] ♦ ♦ @victorrentea ♦ VictorRentea.ro

    Find the code at: https://github.com/victorrentea/performance-profiling.git Join us! victorrentea.ro/community