$30 off During Our Annual Pro Sale. View Details »

Profiling a Java Application @DevBcn 2023

Profiling a Java Application @DevBcn 2023

Professional performance optimization always starts with tracing the root cause of the bottleneck, to make sure we fix the correct problem. Trying to optimize the program by ‘guessing’ what can cause the performance issue is known as Shotgun Tuning, and typically results in messy code, wasted effort, and lots of frustration. In this talk, we’ll explore the best 3 free tools for profiling a Java application: Micrometer, Glowroot, and Mission Control and we’ll discover the power of a profiler built-in already into your JVM – the Java Flight Recorder. Using these tools, we'll trace typical issues like connection/thread starvation, long-running methods, and more.

Victor Rentea

July 03, 2023
Tweet

More Decks by Victor Rentea

Other Decks in Technology

Transcript

  1. Profiling your Java Applica2on
    Slides posted à @victorrentea ♦ VictorRentea.ro
    Code: github.com/victorrentea/performance-profiling.git /

    View Slide

  2. 👉 victorrentea.ro/training-offer
    👋 Hi, I'm Victor Rentea 🇷🇴 PhD(CS)
    Java Champion, 17 years of code, code, code, code, code....
    Consultant & Trainer for 100+ EU companies:
    ❤ Clean Code, Architecture, Unit Tes3ng
    🛠 Spring, Hibernate, Reac3ve
    ⚡ Java Performance, Secure Coding 🔐
    Conference Speaker - many recorded talks on YouTube
    Founder of European SoJware CraJers (5900+ members)
    🔥 Free monthly webinars, 1 hour a>er work. Join: victorrentea.ro/community
    Past events on youtube.com/vrentea
    Father of 👧👦, servant of a 🐈, weekend gardener 🌼 @victorrentea
    VictorRentea.ro

    View Slide

  3. 3 VictorRentea.ro
    Join the European So3ware Cra3ers community at victorrentea.ro/community
    The response time of your endpoint is too slow 😱
    🎉 Congratula6ons! 🎉
    You made it to produc.on, and you're taking serious load!
    Where to start?

    View Slide

  4. 4 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Log timestamps
    log.info("Before call"); // 2023-05-23 00:01:50.000 ...
    // suspect code
    log.info("After call"); // 2023-05-23 00:01:53.100 ...
    Measuring Method Runtime – Old-School Style
    t1-t0
    long t0 = currentTimeMillis();
    List comments = client.fetchComments(loanId); !// suspect code
    long t1 = currentTimeMillis();
    System.out.println("Took millis: " + (t1 - t0));

    View Slide

  5. 5 VictorRentea.ro
    Join the European So3ware Cra3ers community at victorrentea.ro/community
    Micrometer
    @Timed // wraps the method with an aspect that monitors spent time
    List fetchComments(Long id) {
    // suspect code
    }
    then, in a Spring application
    GET /actuator/prometheus returns:
    method_timed_seconds_sum{method="fetchComments",} 100.0
    method_timed_seconds_count{method="fetchComments",} 400
    è avg = 0.25 s/call
    sampled again after 5 seconds:
    method_timed_seconds_sum{method="fetchComments",} 150.0 – 100 = 50 s
    method_timed_seconds_count{method="fetchComments",} 500 – 400 = 100 calls
    è avg = 50 s / 100 = 0.5 s/call (spike)
    !me
    average[ms]
    250
    500

    View Slide

  6. 6 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Micrometer API
    h%ps://www.baeldung.com/micrometer
    // programatic alternative to @Timed
    meterRegistry.timer("name").record(() -> { /*suspect code*/ });
    meterRegistry.timer("name").record(t1 - t0, MILLISECONDS);
    // accumulator
    meterRegistry.counter("earnings").increment(purchase);
    // current value
    meterRegistry.gauge("player-count", currentPlayers.size());
    // eg. "95% requests took ≤ 100 ms"
    @Timed(percentiles = {0.95}, histogram = true)

    View Slide

  7. 7 VictorRentea.ro
    Join the European So3ware Cra3ers community at victorrentea.ro/community
    §Micrometer automa,cally introspects framework internals
    - Exposing metrics to help detect typical issues
    - You can enable more: Tomcat metrics, decorate executors, ...
    §This data is regularly polled/pushed to other tools
    - eg. Prometheus retrieves and stores this data every 5 seconds
    §Tools (eg.) Grafana can later use this data to:
    - Display impressive charts
    - Apply custom aggregaJon formulas
    - Raise alerts
    - ...
    Some examples è
    Observing a System with Micrometer

    View Slide

  8. 8 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    § hMp_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04
    hMp_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0
    § tomcat_connecJons_current_connecJons{...} 298 ⚠ 98 requests waiJng in queue
    tomcat_threads_busy_threads{...} 200 ⚠ = using 100% of the threads => thread pool starvaJon?
    § jvm_gc_memory_allocated_bytes_total 1.8874368E9
    jvm_memory_usage_a>er_gc_percent{...} 45.15 ⚠ >30% => increase max heap?
    § hikaricp_connecJons_acquire_seconds_sum{pool="HikariPool-1",} 80.37 ⚠ conn pool starvaJon?
    hikaricp_connecJons_acquire_seconds_count{pool="HikariPool-1",} 551.0
    § cache_gets_total{cache="cache1",result="miss",} 0.99 ⚠ 99% miss=cache not used. config issue?
    cache_gets_total{cache="cache1",result="hit",} 0.01
    § logback_events_total{level="error",} 1255.0 ⚠ + 1000 than 5 seconds ago => 😱
    § my_own_custom_metric xyz
    Some Metrics
    available at /actuator/prometheus

    View Slide

  9. 9 VictorRentea.ro
    Join the European So3ware Cra3ers community at victorrentea.ro/community
    ?
    Monitoring a single system is easy!

    View Slide

  10. 10 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Compared to a Distributed Ecosystem
    1st challenge: trace requests
    A B C D E
    Monitoring a single system is easy!

    View Slide

  11. 11 VictorRentea.ro
    Join the European So3ware Cra3ers community at victorrentea.ro/community
    §TraceID header is read from the incoming HTTP request/MQ message
    - If missing, it is generated by first system in the call chain
    §TraceID is propagated inside the app:
    - Methods called (via ThreadLocal)
    - Any logs printed (via Logback MDC) – criJcal for distributed log analysis
    - Upstream operators in a Reactor chain (via ReactorContext in WebFlux)
    - Async methods submiMed to a DI-managed *TaskExecutor (via a TaskDecorator)
    §TraceID header is sent in any outgoing request/message
    - ... sent via a DI-injected bean: @Autowired RestTemplate, RabbitTemplate, ...
    Distributed Request Tracing
    h%ps://www.baeldung.com/spring-cloud-sleuth-single-applicaA B
    TraceId

    View Slide

  12. 12 VictorRentea.ro
    Join the European So3ware Cra3ers community at victorrentea.ro/community
    §The TraceId is available for any processing we do
    §Every system reports the span of ,me spent working on a TraceId
    - For performance, a fracJon of spans are reported (eg 1%)
    §... to a central system that builds a ,me chart of all systems
    example è
    Distributed Request Sampling

    View Slide

  13. 13 VictorRentea.ro
    Join the European So3ware Cra3ers community at victorrentea.ro/community
    Distributed Request Sampling with Zipkin

    View Slide

  14. 14 VictorRentea.ro
    Join the European So3ware Cra3ers community at victorrentea.ro/community
    Distributed Request Sampling with Zipkin

    View Slide

  15. 15 VictorRentea.ro
    Join the European So3ware Cra3ers community at victorrentea.ro/community
    §Repeated API calls, by accident
    §API calls in a loop, that could be fetched in batches
    - GET /item/1, GET /item/3, ....
    §Long chains of REST calls
    - A à B à C à D à E
    §System-in-the-middle
    §Independent API calls than can be parallelized
    §Long spans not waiBng for other systems => zoom in 🔎
    Distributed Processing An6-PaKerns ⚠

    View Slide

  16. 16 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Profiling
    Blackbox: inter-system communication
    (Zipkin, ...)
    Graybox: report key framework metrics
    (Micrometer)
    Whitebox: profile methods inside JVM
    (Java Flight Recorderè)
    app
    lib-a lib-b
    methodA() methodB()

    View Slide

  17. 17 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    How to find methods that take a long time?
    💡
    Naive idea: Instrument your code with a -javaagent
    and measure the execution of every single executed Java method
    impractical because of the added CPU+MEMORY overhead

    View Slide

  18. 18 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Java Flight Recorder
    The Tools of the Java Gods

    View Slide

  19. 19 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Java Flight Recorder (JFR)
    Main features:
    §Regularly capture stack traces of all running threads every second
    §Record Internal JVM events about locks, files, sockets, GC, ... or your own
    §Heavily op;mized to extra-low overhead (<2%), usable in prod, for free since Java 11
    Usage:
    §Local JVM => Start/stop from Java Mission Control or IntelliJ ⬆
    §Remote JVM => generate a .jfr file => download => analyze
    First, enable JFR via -XX:+FlightRecorder, then start the recording:
    - Con!nuous from startup: -XX:StartFlightRecording=....
    - Via command line: jcmd JFR.start
    - Via JMX
    - Via tools, eg Glowroot

    View Slide

  20. 26 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Flame Graph
    best way to visualize the output of an execution profiler

    View Slide

  21. 27 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Thread.sleep(Native Method)
    Flame.f(Flame.java:21)
    Flame.entry(Flame.java:17)
    Flame.dummy(Flame.java:14)
    Thread.sleep(Native Method)
    Flame.g(Flame.java:24)
    Flame.entry(Flame.java:18)
    Flame.dummy(Flame.java:14)
    2 samples (33%)
    4 samples (66%)
    JFR samples the stack traces of running threads every second

    View Slide

  22. 28 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Thread.sleep(Native Method)
    Flame.g(Flame.java:24)
    Flame.entry(Flame.java:18)
    Flame.dummy(Flame.java:14)
    Thread.sleep(Native Method)
    Flame.f(Flame.java:21)
    Flame.entry(Flame.java:17)
    Flame.dummy(Flame.java:14)
    2 samples (33%) 4 samples (66%)
    Length of each bar is propor?onal to the number of samples (?me)
    Building a Flame Graph
    JFR samples the stack traces of running threads every second
    Flame.dummy(Flame.java:14)

    View Slide

  23. 29 VictorRentea.ro
    Join the European So3ware Cra3ers community at victorrentea.ro/community
    Thread.sleep(Native Method) Thread.sleep(Native Method)
    Flame.f(Flame.java:21) Flame.g(Flame.java:24)
    Flame.entry(Flame.java:17) Flame.entry(Flame.java:18)
    Flame.dummy(Flame.java:14)
    Data is merged by line number
    2 samples (33%) 4 samples (66%)
    Length of each bar is proportional to the number of samples (time)
    Building a Flame Graph
    JFR samples the stack traces of running threads every second
    Flame.dummy(Flame.java:14)

    View Slide

  24. 30 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Thread.sleep(Native Method) Thread.sleep(Native Method)
    Flame.f(Flame.java:21) Flame.g(Flame.java:24)
    Flame.entry(Flame.java:17) Flame.entry(Flame.java:18)
    Flame.dummy(Flame.java:14)
    2 samples (33%) 4 samples (66%)
    IN REAL LIFE
    Unsafe.park(Native Method)
    ......50 more lines↕......
    SocketInputStream.socketRead0(Native Method)
    ......40 more lines↕......
    423
    210
    After recording 1 minute of heavy parallel load:
    .... 100 lines of Aspects, Spring, Tomcat ...
    Data is merged by line number
    Length of each bar is proportional to the number of samples (time)
    Building a Flame Graph
    JFR samples the stack traces of running threads every second
    BLOCK
    NETWORK

    View Slide

  25. 31 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    ProfiledApp
    Spring Boot 2.7
    http://localhost:8080
    Load Tests
    Java app
    (Gatling.io)
    23 parallel requests
    (closed world model)
    PostgreSQL
    in a Local Docker
    localhost:5432
    WireMock
    in a Local Docker
    http://localhost:9999
    remote API responses
    with 40ms delay
    Glowroot.org
    -javaagent:glowroot.jar
    👉
    http://localhost:4000
    Experiment Environment
    ToxiProxy
    github.com/Shopify/toxiproxy
    installed locally
    localhost:55432
    +3ms network latency
    JFR

    View Slide

  26. 32 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    EXPERIMENT

    View Slide

  27. 33 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Glowroot.org
    §Extremely simple application profiler
    - Add VM arg: -javaagent:/path/to/glowroot.jar
    - Open http://localhost:4000
    §Monitors
    - Methods Profiling, displayed as Flame Graphs
    - SQLs
    - API calls
    - Typical bottlenecks
    - Data per endpoint
    §Free
    - Probably not production-ready, more for learning/debugging

    View Slide

  28. 35 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Split Points
    ß 85% of the query time is spent acquiring a JDBC Connection à
    (aka connection starvation issue)
    Actual waiting
    for SQL server
    Connection Pool
    Starvation

    View Slide

  29. 36 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    False Split
    due to Just-In-Time Compiler (JIT) Optimization
    The Same Method is called via 2 paths !!
    Because JIT optimized a reflection call
    (1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor
    Fix: Warmup JVM: record in prod, or re-run load test

    View Slide

  30. 37 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Time-consuming Aspects
    By default, TransactionInterceptor (@Transactional)
    acquires a connection before entering the annotated method
    :382
    :388

    View Slide

  31. 38 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Lock Contention
    Automatic analysis by Java Mission Control
    of the recorded .jfr file:
    https://wiki.openjdk.org/display/jmc/Releases
    ß Waiting to enter a synchronized method à
    (native C code is not recorded by JFR)

    View Slide

  32. 39 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    HashSet hashSet = ...; // .size() = N (large)
    hashSet.remove(13); // takes constant time O(1)
    List list = ...; // .size() = M
    hashSet.removeAll(list); // takes M x O(hashSet.remove()) = O(M) x O(1) = O(M)
    expected time < 100 ms
    è when M > N => N * list.contains(e) = N * M:
    Unknown Library Details
    Actual = 30k x 29k iterations!!
    in Production
    1800 ms!!
    😱 😱 😱 😱

    View Slide

  33. 41 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    JFR recording of
    ❤Virtual Threads❤
    in a Spring Boot 3 app
    running on Java 19/20

    View Slide

  34. 42 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    JFR Profiling – Lessons Learned
    ✅ "Sees" inside unknown code / library
    ❌ Can't see into naBve C code
    ⚠ 1 sample/second è requires load tests or ran in prod
    ✅ Can be used in produc/on (x-low overhead)
    ⚠ May report wrong line numbers in JIT-opBmized code (CPU+++)
    ⚠ Tuning requires paBence, judgment, and library awareness
    ❌ Can't follow async calls/reacBve chains (change threads)
    è Structured Concurrency[JEP-428 in Java19+] makes forks explicit è hierarchical threads
    ✅ Can record Virtual Threads[JEP-444 in Java 19+]

    View Slide

  35. 44 VictorRentea.ro
    Join the European Software Crafters community at victorrentea.ro/community
    Conclusions
    §Zoom in on the performance issue via distributed tracing (Zipkin)
    §Study default metrics collected by Micrometer & expose more
    §If bo^leneck is sBll unclear, or you can't use above tools ⬆,
    use the JFR profiler in prod or under load tests
    - Also: Load the .jfr recording in JMC for automa8c analysis
    §OpBmize code & configuraBon, measuring the benefit vs. risk/pain

    View Slide

  36. Profiling your Java Application
    [email protected] ♦ ♦ @victorrentea ♦ VictorRentea.ro
    Find the code at: https://github.com/victorrentea/performance-profiling.git
    Join us!
    victorrentea.ro/community

    View Slide