Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using JMH in a real world project

Using JMH in a real world project

Many of you have already heard about JMH and probably even played with it to benchmark some code. But how would you use it in a real project? Which problems can it help to solve?

In this talk I would like to share the experience of using JMH library to drive performance optimizations and track performance regressions. You’ll learn how we use JMH, which issues and pitfalls were encountered and overcome along the way.

And since a measurement is not an end in itself, I will show you what can be a result of using the benchmark-driven approach to performance optimizations. The use-cases will demonstrate issues and solutions that helped bring the performance of our project to its next level.

Dmitry Vyazelenko

October 16, 2015
Tweet

More Decks by Dmitry Vyazelenko

Other Decks in Programming

Transcript

  1. 2 About me • Senior Software Engineer @ Sowatec AG,

    Switzerland • Disorganizer at JCrete Unconference, www.jcrete.org • Contacts: vyazelenko.com, @DVyazelenko
  2. 3 Agenda • Benchmarking - what’s in it for me?

    • Intro to JMH • JMH in Arregulo project • Optimization war stories
  3. Why benchmark? 5 • Assess performance of a software system

    or a single component • Compare performance of different algorithms/implementations • Investigate impact of configuration and/or environment changes on a performance of your system
  4. –Dr. Cliff Click “Without exception every microbenchmark I've seen has

    had serious flaws” http://www.azulsystems.com/presentations/art-of-java-benchmarking 6
  5. public class ListGetBenchmarkBroken { private static final int ITERATIONS =

    1_000_000_000; public static void main(String[] args) { List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } 7
  6. public class ListGetBenchmarkBroken { private static final int ITERATIONS =

    1_000_000_000; public static void main(String[] args) { List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } Executed in 2ms, 0.002808 ns/op 7
  7. public class ListGetBenchmarkBroken { private static final int ITERATIONS =

    1_000_000_000; public static void main(String[] args) { List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } Executed in 2ms, 0.002808 ns/op 7
  8. 8

  9. 10 JMH • OpenJDK project: http://openjdk.java.net/projects/ code-tools/jmh/ • Java harness

    for building, running, and analyzing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM • De-facto standard for writing benchmarks on JVM
  10. 11 JMH features • Benchmark code generation and runtime: Annotation-based

    benchmarks Parameters and state Support for multi-threaded benchmarks Blackhole, compiler control etc… • Command line and API for running benchmarks • Built-in profilers
  11. @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class ListGetBenchmarkJMH { private List<Integer> list;

    private int index; @Setup public void setUp() { list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); index = 3; } @Benchmark public Integer get() { return list.get(index); } } 12
  12. Benchmark Score Units ListGetBenchmarkJMH.get 4.244 ns/op @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public

    class ListGetBenchmarkJMH { private List<Integer> list; private int index; @Setup public void setUp() { list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); index = 3; } @Benchmark public Integer get() { return list.get(index); } } 12
  13. 13 Benchmark modes • Mode.Throughput measures the raw throughput (operations

    per unit of time) • Mode.AverageTime measures the average execution time (time per operation) • Mode.SampleTime samples the execution time. Instead of measuring the total time, JMH measure the time spent in *some* of the benchmark method calls (can infer the distributions, percentiles, etc.) • Mode.SingleShotTime measures the single method invocation time
  14. 14 Profilers • cl - Classloader profiling via standard MBeans

    • comp - JIT compiler profiling via standard MBeans • gc - GC profiling via standard MBeans • stack - Simple and naive Java stack profiler • perf - Linux perf Statistics • perfnorm - Linux perf statistics, normalized by operation count • perfasm - Linux perf + PrintAssembly Profiler • xperfasm - Windows xperf + PrintAssembly Profiler
  15. 16 How we approach performance optimization • Identify a use

    case for which performance is unsatisfactory • Set a performance goal • Write and validate benchmark(s) for this use case • Figure out what to fix • Apply performance fix and see if results improve • If goal is achieved stop, otherwise keep optimizing if cost is not prohibitive
  16. 17 Our setup • Grade build with JMH plugin to

    build benchmarks • Custom runner to execute benchmarks • Dedicated machine for performance runs
  17. JMH Gradle plugin 18 plugins { id 'java' id 'com.github.johnrengelman.shadow'

    version '1.2.2' id 'me.champeau.gradle.jmh' version '0.2.0' } dependencies { jmh 'ai.h2o:h2o-core:3.2.0.5' } jmhJar { append('META-INF/spring.handlers') append('META-INF/spring.schemas') } jmh { jmhVersion = '1.11.1' }
  18. 19 Custom runner public class BenchmarksRunner { public static void

    main(String[] args) throws Exception { String appPath = ... Options options = new OptionsBuilder().parent(new CommandLineOptions(args)) .shouldFailOnError(true).timeout(TimeValue.days(1)) .jvmArgs("-D" + APP_PATH_PROPERTY + "=" + appPath, "-Xmx8g") .resultFormat(ResultFormatType.TEXT) .result(appPath + "/benchmark_result_" + getVersion() + ".txt") .build(); new Runner(options).run(); } }
  19. 20 Results processing • Currently we store them as plain

    text in Confluence • Need to do better. Perhaps use Jenkins JMH plugin (https://github.com/brianfromoregon/jmh-plugin)
  20. 21 Issue 1: default timeout • Default iteration timeout is

    set 10 minutes • Can be set via -to command line switch or timeout(TimeValue) API call • Includes time spent in @Setup/@TearDown
  21. 22 Issue 2: @Setup/@TearDown and threads • JMH does not

    provide any guarantees which threads will invoke @Setup/@TearDown • In single-threaded benchmarks it is the same thread but in multi-threaded benchmarks it is undefined
  22. 23 Issue 3: fail on error is not on by

    default • Fail on error can only be set command line (-foe true) or via API (setFailOnError(true)) • It is not on by default • There is no annotation to set it
  23. 24 JMH wish-list • Unified annotations and command line options

    (i.e. -gc, -foe and others) • Unified @Threads and @Params • More control/determinism over @Setup/ @TearDown • JFR profiler
  24. Story 1: Small things matter 26 public Object[] getValues() {

    Object[] values = new Object[N]; setValue(values, Property1, getValue1()); setValue(values, Property2, getValue2()); ... return values; } private void setValue(Object[] values, String propName, Object propValue) { TypeDefinition type = getType(); int pos = type.getPropertyPosition(propName); values[pos] = propValue; }
  25. Story 1: Small things matter 26 public Object[] getValues() {

    Object[] values = new Object[N]; setValue(values, Property1, getValue1()); setValue(values, Property2, getValue2()); ... return values; } private void setValue(Object[] values, String propName, Object propValue) { TypeDefinition type = getType(); int pos = type.getPropertyPosition(propName); values[pos] = propValue; }
  26. Story 1: Small things matter public Object[] getValues() { TypeDefinition

    type = getType(); Object[] values = new Object[N]; setValue(type, values, Property1, getValue1()); setValue(type, values, Property2, getValue2()); ... return values; } private void setValue(TypeDefinition type, Object[] values, String propName, Object propValue) { int pos = type.getPropertyPosition(propName); values[pos] = propValue; } 27
  27. Story 1: Small things matter public Object[] getValues() { TypeDefinition

    type = getType(); Object[] values = new Object[N]; setValue(type, values, Property1, getValue1()); setValue(type, values, Property2, getValue2()); ... return values; } private void setValue(TypeDefinition type, Object[] values, String propName, Object propValue) { int pos = type.getPropertyPosition(propName); values[pos] = propValue; } 27
  28. Story 1: Small things matter public Object[] getValues() { TypeDefinition

    type = getType(); Object[] values = new Object[N]; setValue(type, values, Property1, getValue1()); setValue(type, values, Property2, getValue2()); ... return values; } private void setValue(TypeDefinition type, Object[] values, String propName, Object propValue) { int pos = type.getPropertyPosition(propName); values[pos] = propValue; } 27 Build time down by 40 minutes (25% of total time)
  29. 28 Story 2: Get type info faster public class LongKeyMap

    { private ConcurrentHashMap<Long, Double> chm; private NonBlockingHashMapLong<Double> nbhm; private long key; @Setup public void setUp() { // generate 50K data array and populate maps ... } @Benchmark public Double get_ConcurrentHashMap() { return chm.get(key); } @Benchmark public Double get_NonBlockingHashMapLong() { return nbhm.get(key); } }
  30. Story 2: Get type info faster 29 JDK 1.7.0_80 ns/op

    LongKeyMap.get_ConcurrentHashMap 14.837 LongKeyMap.get_NonBlockingHashMapLong 6.657
  31. Story 2: Get type info faster JDK 1.8.0_60 ns/op LongKeyMap.get_ConcurrentHashMap

    8.343 LongKeyMap.get_NonBlockingHashMapLong 6.508 29 JDK 1.7.0_80 ns/op LongKeyMap.get_ConcurrentHashMap 14.837 LongKeyMap.get_NonBlockingHashMapLong 6.657
  32. Story 2: Get type info faster JDK 1.8.0_60 ns/op LongKeyMap.get_ConcurrentHashMap

    8.343 LongKeyMap.get_NonBlockingHashMapLong 6.508 29 JDK 1.7.0_80 ns/op LongKeyMap.get_ConcurrentHashMap 14.837 LongKeyMap.get_NonBlockingHashMapLong 6.657 WARNING: Use water.nbhm.NonBlockingHashMapLong (from https://github.com/h2oai/ h2o-3) instead of org.cliffc.high_scale_lib.NonBlockingHashMapLong (from http://sourceforge.net/projects/high-scale-lib/). The latter is not maintained and has bugs!
  33. 30 Story 3: Get property by name ignore case •

    TreeMap(String.CASE_INSENSITIVE_ORDER) • HashMap + String.toLowerCase() on a key • Wrapper around HashMap
  34. 31 Story 3: Get property by name ignore case public

    class MapGet { @Benchmark public Integer HashMap_toLowerCase() { return hashMap.get(key.toLowerCase()); } @Benchmark public Integer TreeMap() { return treeMap.get(key); } @Benchmark public Integer IgnoreCaseMap() { Integer val = ignoreCaseMap.get(key); if (val == null) { val = ignoreCaseMap.get(key.toLowerCase()); if (val != null) { ignoreCaseMap.put(key, val); } } return val; } }
  35. Story 3: Get property by name ignore case 32 JDK

    1.7.0_80 ns/op MapGet.TreeMap GoodKey 95.142 MapGet.TreeMap BAD_KEY 105.230 MapGet.HashMap_toLowerCase GoodKey 67.986 MapGet.HashMap_toLowerCase BAD_KEY 69.942 MapGet.IgnoreCaseMap GoodKey 13.083 MapGet.IgnoreCaseMap BAD_KEY 7.583 JDK 1.8.0_60 ns/op MapGet.TreeMap GoodKey 84.018 MapGet.TreeMap BAD_KEY 94.577 MapGet.HashMap_toLowerCase GoodKey 66.122 MapGet.HashMap_toLowerCase BAD_KEY 69.527 MapGet.IgnoreCaseMap GoodKey 13.809 MapGet.IgnoreCaseMap BAD_KEY 7.802
  36. 33 Story 4: Accessing reflective information public class GetFields {

    private Class<?> klass; private ConcurrentMap<Class<?>, Field[]> fieldsCache; @Benchmark public Field[] getDeclaredFields() { return klass.getDeclaredFields(); } @Benchmark public Field[] getDeclaredFields_cached() { Field[] result = fieldsCache.get(klass); if (result == null) { result = klass.getDeclaredFields(); Field[] tmp = fieldsCache.putIfAbsent(klass, result); if (tmp != null) { return tmp; } } return result; } }
  37. 34 Story 4: Accessing reflective information JDK 1.7.0_80 ns/op GetFields.getDeclaredFields

    284.233 GetFields.getDeclaredFields_cached 11.249 JDK 1.8.0_60 ns/op GetFields.getDeclaredFields 67.078 GetFields.getDeclaredFields_cached 7.143
  38. 35 Story 5: Creating instances public class CreateNewInstance { @Benchmark

    public Object load_class_find_constructor_allocate() throws Exception { Class<?> clazz = Class.forName(className); Constructor<?> ctr = clazz.getConstructor(Date.class, String.class); return ctr.newInstance(arg1, arg2); } @Benchmark public Object find_constructor_allocate() throws Exception { Constructor<?> ctr = klass.getConstructor(Date.class, String.class); return ctr.newInstance(arg1, arg2); } @Benchmark public Object conctructor_allocate() throws Exception { return contructor.newInstance(arg1, arg2); } @Benchmark public Object new_allocate() throws Exception { return new C(arg1, arg2); } }
  39. 36 Story 5: Creating instances JDK 1.7.0_80 ns/op CreateNewInstance.new_allocate 14.453

    CreateNewInstance.conctructor_allocate 238.031 CreateNewInstance.find_constructor_allocate 583.546 CreateNewInstance.load_class_find_constructor_allocate 1498.030
  40. 36 Story 5: Creating instances JDK 1.7.0_80 ns/op CreateNewInstance.new_allocate 14.453

    CreateNewInstance.conctructor_allocate 238.031 CreateNewInstance.find_constructor_allocate 583.546 CreateNewInstance.load_class_find_constructor_allocate 1498.030 JDK 1.8.0_60 ns/op CreateNewInstance.new_allocate 15.462 CreateNewInstance.conctructor_allocate 21.077 CreateNewInstance.find_constructor_allocate 95.389 CreateNewInstance.load_class_find_constructor_allocate 821.017
  41. 37 Story 6: Pitfall of “I know better” public class

    CollectionToList { @Benchmark public ArrayList<Object> new_ArrayList_from_Collection() { return new ArrayList<>(data); } @Benchmark public ArrayList<Object> new_ArrayList_addAll() { ArrayList<Object> result = new ArrayList<>(data.size()); result.addAll(data); return result; } @Benchmark public ArrayList<Object> new_ArrayList_add() { ArrayList<Object> result = new ArrayList<>(data.size()); for (Object o : data) { result.add(o); } return result; } }
  42. 38 Story 6: Pitfall of “I know better” JDK 1.8.0_60

    CollectionToList.new_ArrayList_from_Collection ArrayList 4.242 CollectionToList.new_ArrayList_addAll ArrayList 8.437 CollectionToList.new_ArrayList_add ArrayList 59.440 CollectionToList.new_ArrayList_from_Collection HashSet 120.230 CollectionToList.new_ArrayList_addAll HashSet 106.024 CollectionToList.new_ArrayList_add HashSet 128.250
  43. Story 7: Batching is king 39 void writeObject(DomainObject object) Users

    are forced to implement bulk operations themselves. Don’t have access to low-level features.
  44. Story 7: Batching is king 39 void writeObject(DomainObject object) Users

    are forced to implement bulk operations themselves. Don’t have access to low-level features. void writeObjects(List<DomainObject> objects) Framework implementation of bulk operation. Brings 5x performance improvement using JDBC batching.
  45. • Pivot grid compute in memory • by Product and

    Account: 24.5 seconds • by Product: 31.5 seconds Story 8: Know your use case 40
  46. • Pivot grid compute in DB • by Product and

    Account: 0.5 seconds • by Product and Account: 4 seconds • Pivot grid compute in memory • by Product and Account: 24.5 seconds • by Product: 31.5 seconds Story 8: Know your use case 40
  47. • Pivot grid compute in DB • by Product and

    Account: 0.5 seconds • by Product and Account: 4 seconds • Pivot grid compute in memory • by Product and Account: 24.5 seconds • by Product: 31.5 seconds Story 8: Know your use case 40
  48. Story 9: N+1 SQL we meet again • Originally master/detail

    objects loaded separately Single SELECT to load masters + SELECT per master object to load details for that master 41
  49. Story 9: N+1 SQL we meet again • Originally master/detail

    objects loaded separately Single SELECT to load masters + SELECT per master object to load details for that master 41 • Now master/detail objects loaded together Single SELECT to load all masters + single SELECT to load all details for all masters
  50. 42 Where we stand • Build time down from 3

    hours to 45 minutes • At least one big performance regression caught • On average 10-15% improvements across all use- cases. Specific use cases get 2-3x performance gains out of the box • Using new API brings even more gains (up to 5-6x times improvements)
  51. 43 References • Slides: https://speakerdeck.com/vyazelenko/using-jmh-in-a-real- world-project • Benchmarks: https://github.com/vyazelenko/jmh_talk_joker2015 •

    JMH: http://openjdk.java.net/projects/code-tools/jmh/ • JMH samples: http://hg.openjdk.java.net/code-tools/jmh/file/tip/ jmh-samples/src/main/java/org/openjdk/jmh/samples/ • JMH Gradle plugin: https://github.com/melix/jmh-gradle-plugin • JMH Jenkins plugin: https://github.com/brianfromoregon/jmh- plugin