Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Profiling a Java Application @DevDays 2023

Profiling a Java Application @DevDays 2023

The challenge in performance tuning is tracing the root cause of the bottleneck. Optimizing by ‘guessing’ the cause (aka Shotgun Tuning) leads to messy code, wasted effort, and lots of frustration. Black- and Gray-box monitoring tools like Zipkin and Micrometer allow to easily zoom in to the problem, but point out where your time is lost in the actual code. This talk explores the capabilities of the Java Flight Recorder (JFR), a profiler built-in into your JVM that can help you locate issues to a finer level of detail than those detected by any other tools. Based on JFR recordings, we'll trace issues like connection/thread pool starvation, hot methods, repeated queries, TLAB issues, lock contention, and more, in a live-coding session.

Victor Rentea

May 24, 2023
Tweet

More Decks by Victor Rentea

Other Decks in Technology

Transcript

  1. Profiling your Java Applica2on
    [email protected] ♦ ♦ @victorrentea ♦ VictorRentea.ro
    Code: https://github.com/victorrentea/performance-profiling.git Branch: DevDays_2023_05

    View Slide

  2. 👉 victorrentea.ro/training-offer
    👋 Hi, I'm Victor Rentea 🇷🇴 PhD(CS)
    Java Champion, 17 years of code, code, code, code, code....
    Consultant & Trainer: 5000 developers of 100+ EU companies:
    ❤ Clean Code, Architecture, Unit Tes3ng
    🛠 Java Frameworks: Spring, Hibernate, WebFlux
    ⚡ Java Performance, Secure Coding 🔐
    Conference Speaker – many educaGve talks on YouTube
    Founder of European SoMware CraMers (5800+ members)
    🔥 Free monthly webinars, 1 hour [email protected] work 👉 victorrentea.ro/community
    Past events on my channel: youtube.com/vrentea
    Father of 👧👦, host of a 🐈, weekend gardener 🌼 @victorrentea
    h,ps://VictorRentea.ro

    View Slide

  3. 3 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    The response time of your endpoint is too slow 😱
    🎉 Congratulations! 🎉
    You made it to production, and you're taking serious load!
    Where to start?

    View Slide

  4. 4 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Log timestamps
    log.info("Before call"); !// 2023-05-23 00:01:50.000 !!...
    !// suspect code
    log.info("After call"); !// 2023-05-23 00:01:53.100 !!...
    Measuring Method Runtime - Traditional Style
    t1-t0
    long t0 = currentTimeMillis();
    List comments = client.fetchComments(loanId); // suspect code
    long t1 = currentTimeMillis();
    System.out.println("Took millis: " + (t1 - t0));

    View Slide

  5. 5 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Micrometer
    @Timed !// an aspect wraps this method monitoring spent time
    List fetchComments(Long id) { !!==> see code
    !// suspect code
    }
    then,
    GET /actuator/prometheus returns:
    method_timed_seconds_sum{method="fetchComments",} 100.0
    method_timed_seconds_count{method="fetchComments",} 400.0
    è avg = 250 ms/call
    sampled again after 5 seconds:
    method_timed_seconds_sum{method="fetchComments",} 150.0
    method_timed_seconds_count{method="fetchComments",} 500.0
    è avg = 50s / 100 = 500 ms/call (spike)
    time
    average[ms]
    250
    500

    View Slide

  6. 6 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Other ways to report metrics
    https://www.baeldung.com/micrometer
    // programatic alternative to aspects
    meterRegistry.timer("name").record(() -> { /*suspect code*/ });
    meterRegistry.timer("name").record(t1 - t0, MILLISECONDS);
    // accumulator
    meterRegistry.counter("earnings").increment(purchaseDollars);
    // instant value
    meterRegistry.gauge("player-count", currentPlayers.size());
    // eg. "95% took ≤ 100 ms"
    @Timed(percentiles = {0.95}, histogram = true)

    View Slide

  7. 7 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    §Micrometer automatically introspects framework internals
    - Exposing metrics for typical issues
    - You can enable more: Tomcat metrics, decorate executors, ...
    §This data is regularly polled/pushed to other tools
    - eg every 5s, Prometheus retrieves and stores this data
    §Using that data, (eg) Grafana can:
    - Display impressive charts
    - Apply custom aggregation formulas
    - Raise alerts to Ops
    - ...
    Some examples è
    Observing a System with Micrometer

    View Slide

  8. 8 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    § http_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04
    http_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0
    § tomcat_connections_current_connections{...} 298 //<-98 requests waiting
    tomcat_threads_busy_threads{...} 200 // = default max size => worker thread pool starvation?
    § jvm_gc_memory_allocated_bytes_total 1.8874368E9
    jvm_memory_usage_after_gc_percent{...} 45.15 // >30% => increase max heap?
    § hikaricp_connections_acquire_seconds_sum{pool="HikariPool-1",} 80.37 // conn pool starvation?
    hikaricp_connections_acquire_seconds_count{pool="HikariPool-1",} 551.0
    § cache_gets_total{cache="cache1",result="miss",} 0.99 // 99% missè useless cache or config issue
    cache_gets_total{cache="cache1",result="hit",} 0.01
    § logback_events_total{level="error",} 1255.0 // + 1000 than 5 seconds ago => 😱
    § my_own_metric xyz
    Example Micrometer Metrics
    available at /actuator/prometheus

    View Slide

  9. 9 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    ?
    Monitoring a single system is easy!

    View Slide

  10. 10 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Monitoring Microservices is Harder!
    Performance problems are usually Distributed J
    1st challenge: trace requests
    A B C D E

    View Slide

  11. 11 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    §TraceID is read from the incoming HTTP request/MQ message, or generated
    §TraceID is propagated to:
    - Method calls (via ThreadLocal)
    - Upstream operator in a reacDve chain (via ReactorContext in WebFlux)
    - Async methods submiLed to a Spring-managed *TaskExecutor
    §TraceID is sent as a header in any outgoing HTTP request / MQ message
    - ... sent via a Spring-managed bean: @Autowired RestTemplate, RabbitTemplate, ...
    §Every log line prints the TraceID (from MDC on thread) 👏
    Distributed Request Tracing (Sleuth)
    https://www.baeldung.com/spring-cloud-sleuth-single-application
    A B
    TraceId

    View Slide

  12. 12 VictorRentea.ro
    Join the European So;ware Cra;ers community at victorrentea.ro/community
    §The TraceId is available for any processing we do
    §Every system reports the time spent working on a TraceId (= span)
    - (1% of spans are reported, for performance)
    §... to a central system (eg Zipkin) that builds a time chart of all systems
    example è
    Distributed Request Sampling

    View Slide

  13. 13 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Distributed Request Tracing with Zipkin

    View Slide

  14. 14 VictorRentea.ro
    Join the European So;ware Cra;ers community at victorrentea.ro/community
    Distributed Request Tracing with Zipkin

    View Slide

  15. 15 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    §The TraceId is available for any processing we do
    §Every system reports the Gme spent working on a TraceId (= span)
    - (1% of spans are reported, for performance)
    §... to a central system (eg Zipkin) that builds a Gme chart of all systems
    ⚠ Look out for anA-paCerns like:
    §Repeated API calls => avoid repeated calls / fetch in batches
    §Long chains of REST calls => avoid system-in-the-middle
    §Independent calls => parallelize
    §Long spans not waiGng for other systems => zoom in 🔎
    Distributed Request Sampling

    View Slide

  16. 16 VictorRentea.ro
    Join the European So;ware Cra;ers community at victorrentea.ro/community
    Profiling
    Blackbox: inter-system communication
    (Zipkin, ...)
    Graybox: report key framework metrics
    (Micrometer)
    Whitebox: profile methods inside JVM
    (Java Flight Recorderè)

    View Slide

  17. 17 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    How to find methods that take a long Dme?
    💡
    Idea: Instrument your code with a -javaagent
    and measure the execuDon of every executed Java method
    Naive idea: the added overhead makes this approach impracDcal

    View Slide

  18. 18 VictorRentea.ro
    Join the European So;ware Cra;ers community at victorrentea.ro/community
    Java Flight Recorder
    The Tools of the Java Gods

    View Slide

  19. 19 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Java Flight Recorder (JFR)
    Main features:
    §Regularly capture stack traces of all running threads every second
    §Record Internal JVM events about locks, files, sockets, GC, ... or your own
    §Heavily op9mized to extra-low overhead (<2%), usable in prod, for free since Java 11
    Usage:
    §Local JVM => Start/stop from Java Mission Control (JMC) or IntelliJ ⬆
    §Remote JVM => generate a .jfr file => download => analyze with JMC or IntelliJ
    First, enable JFR via -XX:+FlightRecorder, then start the recording:
    1) From startup: -XX:StartFlightRecording=....
    2) Via command line: jcmd JFR.start
    3) Via JMX
    4) Via tools, eg Glowroot

    View Slide

  20. 26 VictorRentea.ro
    Join the European So;ware Cra;ers community at victorrentea.ro/community
    The JFR Profiler Output
    viewed as a Flame Graph

    View Slide

  21. 27 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Thread.sleep(Native Method)
    Flame.f(Flame.java:21)
    Flame.entry(Flame.java:17)
    Flame.dummy(Flame.java:14)
    Thread.sleep(Native Method)
    Flame.g(Flame.java:24)
    Flame.entry(Flame.java:18)
    Flame.dummy(Flame.java:14)
    2 samples (33%)
    4 samples (66%)
    JFR records every second the stack traces of all running threads

    View Slide

  22. 28 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Thread.sleep(Native Method)
    Flame.g(Flame.java:24)
    Flame.entry(Flame.java:18)
    Flame.dummy(Flame.java:14)
    Thread.sleep(Native Method)
    Flame.f(Flame.java:21)
    Flame.entry(Flame.java:17)
    Flame.dummy(Flame.java:14)
    2 samples (33%) 4 samples (66%)
    Length of a bar = number of samples,
    propor4onal to the 4me spent on that path
    JFR records every second the stack traces of all running threads
    Flame Graphs - Explained

    View Slide

  23. 29 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Thread.sleep(Native Method) Thread.sleep(Native Method)
    Flame.f(Flame.java:21) Flame.g(Flame.java:24)
    Flame.entry(Flame.java:17) Flame.entry(Flame.java:18)
    Flame.dummy(Flame.java:14)
    Length of a bar = number of samples,
    propor4onal to the 4me spent on that path
    JFR records every second the stack traces of all running threads
    Data is aggregated by line number (merged)
    Flame Graphs - Explained
    2 samples (33%) 4 samples (66%)

    View Slide

  24. 30 VictorRentea.ro
    Join the European So;ware Cra;ers community at victorrentea.ro/community
    Thread.sleep(Native Method) Thread.sleep(Native Method)
    Flame.f(Flame.java:21) Flame.g(Flame.java:24)
    Flame.entry(Flame.java:17) Flame.entry(Flame.java:18)
    Flame.dummy(Flame.java:14)
    Length of a bar = number of samples,
    propor4onal to the 4me spent on that path
    JFR records every second the stack traces of all running threads
    Data is aggregated by line number (merged)
    Flame Graphs - Explained
    2 samples (33%) 4 samples (66%)
    IN PRACTICE
    Unsafe.park(Native Method)
    ......50 more lines↕......
    SocketInputStream.socketRead0(Native Method)
    ......40 more lines↕......
    423
    210
    After recording 1 minute of heavy parallel load:
    .... 100 lines of Aspects, Spring, Tomcat ...

    View Slide

  25. 31 VictorRentea.ro
    Join the European So;ware Cra;ers community at victorrentea.ro/community
    ProfiledApp
    Spring Boot 2.7
    http://localhost:8080
    Load Tests
    Java app
    (Gatling.io)
    23 parallel requests
    (closed world)
    PostgreSQL
    in a Local Docker
    localhost:5432
    WireMock
    in a Local Docker
    http://localhost:9999
    remote API responses
    with 40ms delay
    Glowroot.org
    -javaagent:glowroot.jar
    👉
    http://localhost:4000
    Experiment Environment
    ToxiProxy
    installed locally
    localhost:55432
    +3ms network latency
    JFR

    View Slide

  26. 32 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    EXPERIMENT

    View Slide

  27. 33 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Glowroot.org
    §Extremely simple applica- Add VM arg: -javaagent:/path/to/glowroot.jar
    - Open hep://localhost:4000
    §Monitors
    - Methods => Flame Graphs
    - SQLs
    - API calls
    - Typical boelenecks
    - Data per endpoint
    §Free
    - Not sure if produckon-ready, more for learning/debugging

    View Slide

  28. 37 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Split Points
    ß 85% of the query time is spent acquiring a JDBC Connection à
    (aka connection starvation issue)
    Actual waiDng
    for SQL server
    Connection Pool
    Starvation

    View Slide

  29. 38 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    False Split
    due to Just-In-Time Compiler (JIT) Op>miza>on
    Same method is called via 2 paths !!
    Because JIT optimized
    (1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor
    for calling a method using reflection
    Fix: Warmup JVM (eg. re-run load test)

    View Slide

  30. 39 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Time-consuming Aspects
    By default, TransacDonInterceptor (@TransacDonal)
    acquires a connecDon before entering the annotated method
    :382
    :388

    View Slide

  31. 40 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Lock ContenSon
    AutomaDc analysis by Java Mission Control
    of the recorded .jfr file:
    hAps://wiki.openjdk.org/display/jmc/Releases
    ß WaiDng to enter a synchronized method à
    (naEve C code is not recorded by JFR)

    View Slide

  32. 41 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    HashSet hashSet = ...; // .size() = N
    hashSet.remove(13); // = O(1) time = constant time
    List list = ...; // .size() = M
    hashSet.removeAll(list); // expected = O(M) x O(hashSet.remove()) = O(M) x O(1)
    I expected < 100 ms
    è if M > N => N * list.contains() = N * M:
    Unknown Library Details
    Actual = 30k x 29k iterations!!
    in Production
    1800 ms!!
    😱 😱 😱 😱

    View Slide

  33. 43 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    JFR recording of
    ❤Virtual Threads❤
    in a Spring Boot 3 app
    running on Java 19/20

    View Slide

  34. 44 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    JFR Profiling – Lessons Learned
    ✅ Can "see" inside unknown code / library
    ❌ Can't see into na>ve C code
    ⚠ 1 sample/second è requires load from tests or from prod
    ✅ Can be used in produc/on (low overhead)
    ⚠ May report wrong line numbers in JIT-op>mized code (CPU+++)
    ⚠ Tuning requires pa>ence, judgment, and library awareness
    ❌ Cannot follow async calls/reac>ve chains (swapping threads)
    è Structured Concurrency[JEP-428 in Java19+] makes forks explicit: hierarchical threads
    ✅ Can record Virtual Threads[JEP-444 in Java 19+]

    View Slide

  35. 46 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Conclusions
    §Zoom in on the issue via distributed tracing (Zipkin)
    §Study default metrics collected by Micrometer & expose more
    §If boWleneck is s>ll unclear, or you cannot use above tools ⬆,
    use the JFR profiler in prod or under load tests
    - Also: analyze the .jfr recording using JMC
    §Op>mize code & configura>on, measuring the benefit vs. risk/pain

    View Slide

  36. Profiling your Java Application
    [email protected] ♦ ♦ @victorrentea ♦ VictorRentea.ro
    Find the code at: https://github.com/victorrentea/performance-profiling.git
    Join us!
    victorrentea.ro/community

    View Slide