Upgrade to Pro — share decks privately, control downloads, hide ads and more …

When benchmarks go bad - what I learned from me...

When benchmarks go bad - what I learned from measuring performance wrong

The world of performance analysis is littered with flawed claims, cognitive biases, dangerous intuitions, and beguiling
fallacies. Sadly, Holly has been guilty of all of the above! Repeatedly. But this is a no-judgement zone. Some
measurement anti-patterns are subtle, and some are downright counter-intuitive. In this talk, Holly will explain why
measuring performance is important, and talk through some of the ways it can go wrong. That would be depressing if that
was all there was, so she’ll also introduce a toolbox of questions and principles that you can use to improve the
performance of your own applications.

These include:

- How to set up a test system
- Recommended load generators
- The USE method

Avatar for Holly Cummins

Holly Cummins

May 05, 2026

More Decks by Holly Cummins

Other Decks in Programming

Transcript

  1. holly cummins Devoxx UK when benchmarks go bad what I

    learned from measuring performance wrong
  2. quarkus can handle 1.8x more load we thought we were

    isolating the app to 4 cores, but the app, load generator, and database were sharing all 16
  3. quarkus can handle 1.8x more load we thought we were

    isolating the app to 4 cores, but the app, load generator, and database were sharing all 16 - Hibernate second-level cache was also enabled between these runs - Note the November 2025 dates on these charts - These runs had some other issues we discovered later; look at the latest data for our most-correct comparison quarkus can handle 3x more load
  4. # disable turbo boost echo 1 >/sys/devices/system/cpu/intel_pstate/no_turbo # drop pagecache,

    dentries and inodes sync sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches’ taskset --cpu-list 0-3 java -Xms512m -Xmx512m -XX: +UseNUMA -XX:+UnlockDiagnosticVMOptions -XX: +DebugNonSafepoints -jar <whatever>
  5. # disable turbo boost echo 1 >/sys/devices/system/cpu/intel_pstate/no_turbo # drop pagecache,

    dentries and inodes sync sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches’ taskset --cpu-list 0-3 java -Xms512m -Xmx512m -XX: +UseNUMA -XX:+UnlockDiagnosticVMOptions -XX: +DebugNonSafepoints -jar <whatever>
  6. ca

  7. if your system under starts slowing down, the injector slows

    down too, which makes your measures look better than they should
  8. Coordinate omission makes what you think you are measuring response

    time when it’s really measures the service time component of latency. Service time is how long it takes to do the work. Response time is the amount of time spent waiting before the work starts.
  9. ts=$(_date) while ! (curl -sf http://localhost:8080/fruits > /dev/ null) do

    # Spin here for max precision : done TTFR=$((($(_date) - ts)/1000000))
  10. ts=$(_date) while ! (curl -sf http://localhost:8080/fruits > /dev/ null) do

    # Spin here for max precision : done TTFR=$((($(_date) - ts)/1000000))
  11. # Start the client loop before the application so it's

    already polling ( while true; do # Try to open a TCP connection to the target host and port and use file descriptor 3 for it if exec 3<>/dev/tcp/"$HOST"/"$PORT"; then # Send HTTP GET request to the server if ! echo -e "GET $URL_PATH HTTP/1.0\r\nHost: $HOST\r\nConnection: close\r\n\r\n" >&3; then exec 3>&- continue fi # Read the HTTP response status line and extract the status code if ! read -r _ status_code _ <&3; then exec 3>&- continue fi # Close the file descriptor exec 3>&- # If we got a 200 OK response, exit the loop if [[ "$status_code" == "200" ]]; then break fi fi # Spin here and do nothing rather than waiting some arbitrary unlucky timing done # Record the timestamp when we successfully got a 200 response _date > "$END_TS_FILE" ) 2>/dev/null & CURL_PID=$! # Record start time and launch the application. # Redirect and exec inside the subshell so the application process directly replaces the subshell, making $APP_PID its actual PID. ts=$(_date) ( exec $RUN_CMD &>"$LOG_FILE" ) & APP_PID=$! # Ensure cleanup on exit (e.g. on timeout) trap "kill -15 $APP_PID $CURL_PID 2>/dev/null; wait $APP_PID 2>/dev/null || true; rm -f $END_TS_FILE" EXIT # Wait for the client loop to get a successful response wait $CURL_PID 2>/dev/null || true TTFR=$(($(cat "$END_TS_FILE") - ts))
  12. there is no “true” RSS: you can trade off RSS

    against CPU by shrinking max heap
  13. ts=$(_date) while ! (curl -sf http://localhost:8080/fruits > /dev/null) do #

    Spin here for max precision : done TTFR=$((($(_date) - ts)/1000000)) RSS=`ps -o rss= -p $CURRENT_PID | sed 's/^ *//g'`
  14. ts=$(_date) while ! (curl -sf http://localhost:8080/fruits > /dev/null) do #

    Spin here for max precision : done TTFR=$((($(_date) - ts)/1000000)) RSS=`ps -o rss= -p $CURRENT_PID | sed 's/^ *//g'`
  15. ts=$(_date) while ! (curl -sf http://localhost:8080/fruits > /dev/null) do #

    Spin here for max precision : done TTFR=$((($(_date) - ts)/1000000)) RSS=`ps -o rss= -p $CURRENT_PID | sed 's/^ *//g'` it’s ok to measure time-to- fi rst-request and RSS in the same experiment
  16. ts=$(_date) while ! (curl -sf http://localhost:8080/fruits > /dev/null) do #

    Spin here for max precision : done TTFR=$((($(_date) - ts)/1000000)) RSS=`ps -o rss= -p $CURRENT_PID | sed 's/^ *//g'`
  17. you are not measuring what you think you are especially

    if you’re measuring more than one thing
  18. validate with data about real world consequences feedback is a

    friend guides decisions validates measurements
  19. spring quarkus app with code that deadlocked – 1.75 req/s

    initial test with correct code 338 req/s 390 req/s with faster disk [not measured] 25,599 req/s https://quarkus.io/blog/reactive-crud-performance-case-study/
  20. spring quarkus app with code that deadlocked – 1.75 req/s

    initial test with correct code 338 req/s 390 req/s with faster disk [not measured] 25,599 req/s 2022-06-17 15:20:44,507 ERROR [org.hib.rea.errors] (vert.x-eventloop-thread-45) HR000057: Failed to execute statement [select zipcode0_.zip as zip1_0_0_, zipcode0_.city as city2_0_0_, zipcode0_.county as county3_0_0_, zipcode0_.state as state4_0_0_, https://quarkus.io/blog/reactive-crud-performance-case-study/ 2022-06-17 15:20:44,507 ERROR [org.hib.rea.errors] (vert.x-eventloop-thread-45) HR000057: Failed to execute statement [select zipcode0_.zip as zip1_0_0_, zipcode0_.city as city2_0_0_, zipcode0_.county as county3_0_0_, zipcode0_.state as state4_0_0_, zipcode0_.timezone as timezone5_0_0_, zipcode0_.type as type6_0_0_ from ZipCode zipcode0_ where zipcode0_.zip=?]: could not load an entity: [com.baeldung.quarkus_project.ZipCode#08231]: java.util.concurrent.CompletionException: io.vertx.core.impl.NoStackTraceThrowable Timeout at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) at io.vertx.core.Future.lambda$toCompletionStage$2(Future.java:362) ... at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: io.vertx.core.impl.NoStackTraceThrowable: Timeout 2022-06-17 15:20:44,507 ERROR [org.hib.rea.errors] (vert.x-eventloop-thread-45) HR000057: Failed to execute statement [select zipcode0_.zip as zip1_0_0_, zipcode0_.city as city2_0_0_, zipcode0_.county as county3_0_0_, zipcode0_.state as state4_0_0_, zipcode0_.timezone as timezone5_0_0_, zipcode0_.type as type6_0_0_ from ZipCode zipcode0_ where zipcode0_.zip=?]: could not load an entity: [com.baeldung.quarkus_project.ZipCode#08231]: java.util.concurrent.CompletionException: io.vertx.core.impl.NoStackTraceThrowable Timeout at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) at io.vertx.core.Future.lambda$toCompletionStage$2(Future.java:362) ... at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: io.vertx.core.impl.NoStackTraceThrowable: Timeout
  21. spring quarkus app with code that deadlocked – 1.75 req/s

    initial test with correct code 338 req/s 390 req/s with faster disk [not measured] 25,599 req/s 2022-06-17 15:20:44,507 ERROR [org.hib.rea.errors] (vert.x-eventloop-thread-45) HR000057: Failed to execute statement [select zipcode0_.zip as zip1_0_0_, zipcode0_.city as city2_0_0_, zipcode0_.county as county3_0_0_, zipcode0_.state as state4_0_0_, https://quarkus.io/blog/reactive-crud-performance-case-study/ 2022-06-17 15:20:44,507 ERROR [org.hib.rea.errors] (vert.x-eventloop-thread-45) HR000057: Failed to execute statement [select zipcode0_.zip as zip1_0_0_, zipcode0_.city as city2_0_0_, zipcode0_.county as county3_0_0_, zipcode0_.state as state4_0_0_, zipcode0_.timezone as timezone5_0_0_, zipcode0_.type as type6_0_0_ from ZipCode zipcode0_ where zipcode0_.zip=?]: could not load an entity: [com.baeldung.quarkus_project.ZipCode#08231]: java.util.concurrent.CompletionException: io.vertx.core.impl.NoStackTraceThrowable Timeout at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) at io.vertx.core.Future.lambda$toCompletionStage$2(Future.java:362) ... at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: io.vertx.core.impl.NoStackTraceThrowable: Timeout 2022-06-17 15:20:44,507 ERROR [org.hib.rea.errors] (vert.x-eventloop-thread-45) HR000057: Failed to execute statement [select zipcode0_.zip as zip1_0_0_, zipcode0_.city as city2_0_0_, zipcode0_.county as county3_0_0_, zipcode0_.state as state4_0_0_, zipcode0_.timezone as timezone5_0_0_, zipcode0_.type as type6_0_0_ from ZipCode zipcode0_ where zipcode0_.zip=?]: could not load an entity: [com.baeldung.quarkus_project.ZipCode#08231]: java.util.concurrent.CompletionException: io.vertx.core.impl.NoStackTraceThrowable Timeout at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) at io.vertx.core.Future.lambda$toCompletionStage$2(Future.java:362) ... at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: io.vertx.core.impl.NoStackTraceThrowable: Timeout caused by misuse of @Transactional
  22. spring quarkus app with code that deadlocked – 1.75 req/s

    initial test with correct code 338 req/s 390 req/s with faster disk [not measured] 25,599 req/s 2022-06-17 15:20:44,507 ERROR [org.hib.rea.errors] (vert.x-eventloop-thread-45) HR000057: Failed to execute statement [select zipcode0_.zip as zip1_0_0_, zipcode0_.city as city2_0_0_, zipcode0_.county as county3_0_0_, zipcode0_.state as state4_0_0_, https://quarkus.io/blog/reactive-crud-performance-case-study/ 2022-06-17 15:20:44,507 ERROR [org.hib.rea.errors] (vert.x-eventloop-thread-45) HR000057: Failed to execute statement [select zipcode0_.zip as zip1_0_0_, zipcode0_.city as city2_0_0_, zipcode0_.county as county3_0_0_, zipcode0_.state as state4_0_0_, zipcode0_.timezone as timezone5_0_0_, zipcode0_.type as type6_0_0_ from ZipCode zipcode0_ where zipcode0_.zip=?]: could not load an entity: [com.baeldung.quarkus_project.ZipCode#08231]: java.util.concurrent.CompletionException: io.vertx.core.impl.NoStackTraceThrowable Timeout at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) at io.vertx.core.Future.lambda$toCompletionStage$2(Future.java:362) ... at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: io.vertx.core.impl.NoStackTraceThrowable: Timeout 2022-06-17 15:20:44,507 ERROR [org.hib.rea.errors] (vert.x-eventloop-thread-45) HR000057: Failed to execute statement [select zipcode0_.zip as zip1_0_0_, zipcode0_.city as city2_0_0_, zipcode0_.county as county3_0_0_, zipcode0_.state as state4_0_0_, zipcode0_.timezone as timezone5_0_0_, zipcode0_.type as type6_0_0_ from ZipCode zipcode0_ where zipcode0_.zip=?]: could not load an entity: [com.baeldung.quarkus_project.ZipCode#08231]: java.util.concurrent.CompletionException: io.vertx.core.impl.NoStackTraceThrowable Timeout at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347) at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162) at io.vertx.core.Future.lambda$toCompletionStage$2(Future.java:362) ... at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:503) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) Caused by: io.vertx.core.impl.NoStackTraceThrowable: Timeout caused by misuse of @Transactional and, yes, it’s a bad error message and we fi xed that :)
  23. https://quarkus.io/blog/reactive-crud-performance-case-study/ spring quarkus app with code that deadlocked – 1.75

    req/s initial test with correct code 338 req/s 390 req/s with faster disk [not measured] 25,599 req/s
  24. https://quarkus.io/blog/reactive-crud-performance-case-study/ spring quarkus app with code that deadlocked – 1.75

    req/s initial test with correct code 338 req/s 390 req/s with faster disk [not measured] 25,599 req/s
  25. https://quarkus.io/blog/reactive-crud-performance-case-study/ spring quarkus app with code that deadlocked – 1.75

    req/s initial test with correct code 338 req/s 390 req/s with faster disk [not measured] 25,599 req/s
  26. https://quarkus.io/blog/reactive-crud-performance-case-study/ spring quarkus app with code that deadlocked – 1.75

    req/s initial test with correct code 338 req/s 390 req/s with faster disk [not measured] 25,599 req/s
  27. https://quarkus.io/blog/reactive-crud-performance-case-study/ spring quarkus app with code that deadlocked – 1.75

    req/s initial test with correct code 338 req/s 390 req/s with faster disk [not measured] 25,599 req/s
  28. - can we get the same answer repeatedly? - are

    results deterministic? - is this close to real-world?
  29. - can we get the same answer repeatedly? - are

    results deterministic? - is this close to real-world? - is this representative of the way applications will be run?
  30. - can we get the same answer repeatedly? - are

    results deterministic? - are we reporting useful metrics? - is this close to real-world? - is this representative of the way applications will be run?
  31. - can we get the same answer repeatedly? - are

    results deterministic? - are we reporting useful metrics? - does this help us make a decision? - is this close to real-world? - is this representative of the way applications will be run?
  32. - can we get the same answer repeatedly? - are

    results deterministic? - are we reporting useful metrics? - does this help us make a decision? - is it answering a question we actually care about? - is this close to real-world? - is this representative of the way applications will be run?
  33. - evaluating before-and- after of performance tweaks - comparing frameworks

    against each other - capacity planning - cost estimation - carbon footprint estimation
  34. - evaluating before-and- after of performance tweaks - comparing frameworks

    against each other - answering questions - making decisions - capacity planning - cost estimation - carbon footprint estimation
  35. - measuring what you want to measure is hard -

    aim for reproducibility the distillate
  36. - measuring what you want to measure is hard -

    aim for reproducibility - isolate application on its own hardware the distillate
  37. - measuring what you want to measure is hard -

    aim for reproducibility - isolate application on its own hardware - disable turbo-boost the distillate
  38. - measuring what you want to measure is hard -

    aim for reproducibility - isolate application on its own hardware - disable turbo-boost - aim for realism the distillate
  39. - measuring what you want to measure is hard -

    aim for reproducibility - isolate application on its own hardware - disable turbo-boost - aim for realism - prod-like hardware, app, and data the distillate
  40. - measuring what you want to measure is hard -

    aim for reproducibility - isolate application on its own hardware - disable turbo-boost - aim for realism - prod-like hardware, app, and data - aim for relevance the distillate
  41. - measuring what you want to measure is hard -

    aim for reproducibility - isolate application on its own hardware - disable turbo-boost - aim for realism - prod-like hardware, app, and data - aim for relevance - similar measurements for different things usually mean you’re just measuring a sneaky bottleneck the distillate
  42. - measuring what you want to measure is hard -

    aim for reproducibility - isolate application on its own hardware - disable turbo-boost - aim for realism - prod-like hardware, app, and data - aim for relevance - similar measurements for different things usually mean you’re just measuring a sneaky bottleneck - vary one thing at a time the distillate
  43. - measuring what you want to measure is hard -

    aim for reproducibility - isolate application on its own hardware - disable turbo-boost - aim for realism - prod-like hardware, app, and data - aim for relevance - similar measurements for different things usually mean you’re just measuring a sneaky bottleneck - vary one thing at a time - validate measurements by active benchmarking the distillate
  44. - measuring what you want to measure is hard -

    aim for reproducibility - isolate application on its own hardware - disable turbo-boost - aim for realism - prod-like hardware, app, and data - aim for relevance - similar measurements for different things usually mean you’re just measuring a sneaky bottleneck - vary one thing at a time - validate measurements by active benchmarking - measure a metric with business relevance the distillate