Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Using JMH in a real world project (JavaLand 2016)

Using JMH in a real world project (JavaLand 2016)

Dmitry Vyazelenko

March 08, 2016
Tweet

More Decks by Dmitry Vyazelenko

Other Decks in Programming

Transcript

  1. 2 About me • Senior Software Engineer @ Sowatec AG,

    Switzerland • Disorganizer at JCrete Unconference, www.jcrete.org • Contacts: vyazelenko.com, @DVyazelenko
  2. 3 Agenda • Benchmarking - what’s in it for me?

    • Intro to JMH • JMH in Arregulo project • Optimization war stories
  3. Why benchmark? 5 • Assess performance of a software system

    or a single component • Compare performance of different algorithms/implementations • Investigate impact of configuration and/or environment changes on a performance of your system
  4. –Dr. Cliff Click “Without exception every microbenchmark I've seen has

    had serious flaws” http://www.azulsystems.com/presentations/art-of-java-benchmarking 6
  5. public class ListGetBenchmarkBroken { private static final int ITERATIONS =

    1_000_000_000; public static void main(String[] args) { List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } 7
  6. public class ListGetBenchmarkBroken { private static final int ITERATIONS =

    1_000_000_000; public static void main(String[] args) { List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } Executed in 6ms, 0.006736898 ns/op 7
  7. public class ListGetBenchmarkBroken { private static final int ITERATIONS =

    1_000_000_000; public static void main(String[] args) { List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } Executed in 6ms, 0.006736898 ns/op 7
  8. 8

  9. 10 JMH • OpenJDK project: http://openjdk.java.net/projects/code-tools/jmh/ • Java harness for

    building, running, and analyzing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM • De-facto standard for writing benchmarks on JVM • JEP 230: Microbenchmark Suite (Status: Proposed to Drop)
  10. 11 JMH features • Benchmark code generation and runtime: Annotation-based

    benchmarks Parameters and state Support for multi-threaded benchmarks Blackhole, compiler control etc… • Command line and API for running benchmarks • Built-in profilers
  11. @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class ListGetBenchmarkJMH { private List<Integer> list;

    private int index; @Setup public void setUp() { list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); index = 3; } @Benchmark
 public void baseline() {
 } @Benchmark public Integer get() { return list.get(index); } } 12
  12. Benchmark Mode Cnt Score Error Units ListGetBenchmarkJMH.baseline avgt 200 0.282

    ± 0.003 ns/op ListGetBenchmarkJMH.get avgt 200 3.985 ± 0.024 ns/op @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class ListGetBenchmarkJMH { private List<Integer> list; private int index; @Setup public void setUp() { list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); index = 3; } @Benchmark
 public void baseline() {
 } @Benchmark public Integer get() { return list.get(index); } } 12
  13. 14 Benchmark modes • Mode.Throughput - measures the raw throughput

    (operations per unit of time) • Mode.AverageTime - measures the average execution time (time per operation) • Mode.SampleTime - samples the execution time. Instead of measuring the total time, JMH measure the time spent in *some* of the benchmark method calls (can infer the distributions, percentiles, etc.) • Mode.SingleShotTime - measures the single method invocation time
  14. 15 Profilers • cl - Classloader profiling via standard MBeans

    • comp - JIT compiler profiling via standard MBeans • gc - GC profiling via standard MBeans • stack - Simple and naive Java stack profiler • perf - Linux perf Statistics • perfnorm - Linux perf statistics, normalized by operation count • perfasm - Linux perf + PrintAssembly Profiler • xperfasm - Windows xperf + PrintAssembly Profiler
  15. 17 How we (should) approach performance optimizations • Identify a

    use case for which performance is unsatisfactory • Set a performance goal • Write and validate benchmark(s) for this use case • Figure out what to fix • Apply performance fix and see if results improve • If goal is achieved stop, otherwise keep optimizing if cost is not prohibitive
  16. 18 Our current setup • Grade build with JMH plugin

    to build benchmarks • Dedicated machine for performance runs
  17. Gradle build 19 apply plugin: "com.github.johnrengelman.shadow" apply plugin: "me.champeau.gradle.jmh" jmhJar

    { append("META-INF/spring.handlers") append("META-INF/spring.schemas") } jmh { jmhVersion = "1.11.3" resultFormat = "TEXT" resultsFile = file("${buildDir}/reports/jmh/benchmark_result_${project.version}.txt") humanOutputFile = file("${buildDir}/reports/jmh/output_${project.version}.txt") failOnError = true timeout = "1 day" jvmArgs = "-Xmx8g -XX:+PrintCommandLineFlags -Darregulo.path=${projectDir}" }
  18. 21 Issue 1: default timeout • Default iteration timeout is

    set 10 minutes • Can be set via -to command line switch or timeout(TimveValue) API call • Includes time spent in @Setup/@TearDown
  19. 22 Issue 2: @Setup/@TearDown and threads • JMH does not

    provide any guarantees which thread will invoke @Setup/@TearDown methods • In single-threaded benchmarks it is always the same thread but in multi-threaded benchmarks it is undefined
  20. 23 Issue 3: fail on error is not on by

    default • Fail on error can only be set command line (-foe true) or via API (setFailOnError(true)) • It is not on by default • There is no annotation to set it
  21. 24 JMH wish-list • Unified annotations and command line options

    (i.e. -gc, -foe and others) • Unified @Threads and @Params • More control/determinism over @Setup/@TearDown • JFR profiler
  22. Story 1: O(N) algorithm 26 public Object[] getValues() { Object[]

    values = new Object[N]; setValue(values, Property1, getValue1()); setValue(values, Property2, getValue2()); ... return values; } protected final void setValue(Object[] values, String propName, Object propValue) { TypeDefinition type = getType(); int pos = type.getPropertyPosition(propName); values[pos] = propValue; }
  23. Story 1: O(N) algorithm 26 public Object[] getValues() { Object[]

    values = new Object[N]; setValue(values, Property1, getValue1()); setValue(values, Property2, getValue2()); ... return values; } protected final void setValue(Object[] values, String propName, Object propValue) { TypeDefinition type = getType(); int pos = type.getPropertyPosition(propName); values[pos] = propValue; }
  24. Story 1: O(N) algorithm public Object[] getValues() { TypeDefinition type

    = getType(); Object[] values = new Object[N]; setValue(type, values, Property1, getValue1()); setValue(type, values, Property2, getValue2()); ... return values; } protected final void setValue(TypeDefinition type, Object[] values, String propName, Object propValue) { int pos = type.getPropertyPosition(propName); values[pos] = propValue; } 27
  25. Story 1: O(N) algorithm public Object[] getValues() { TypeDefinition type

    = getType(); Object[] values = new Object[N]; setValue(type, values, Property1, getValue1()); setValue(type, values, Property2, getValue2()); ... return values; } protected final void setValue(TypeDefinition type, Object[] values, String propName, Object propValue) { int pos = type.getPropertyPosition(propName); values[pos] = propValue; } 27 Build time down by 40 minutes (25% of total time)
  26. 28 Story 2: Get type info faster public class LongKeyMap

    { private ConcurrentHashMap<Long, Double> chm; private NonBlockingHashMapLong<Double> nbhm; private long key; @Setup public void setUp() { // generate 50K data array and populate maps ... } @Benchmark public Double get_ConcurrentHashMap() { return chm.get(key); } @Benchmark public Double get_NonBlockingHashMapLong() { return nbhm.get(key); } }
  27. Story 2: Get type info faster 29 Benchmark 1.7.0_80 1.8.0_74

    9-ea+108 LongKeyMap.get_ConcurrentHashMap 13.737 ± 0.067 7.464 ± 0.037 8.004 ± 0.050 LongKeyMap.get_NonBlockingHashMapLong 5.939 ± 0.009 5.896 ± 0.007 6.370 ± 0.008
  28. Story 2: Get type info faster 29 Benchmark 1.7.0_80 1.8.0_74

    9-ea+108 LongKeyMap.get_ConcurrentHashMap 13.737 ± 0.067 7.464 ± 0.037 8.004 ± 0.050 LongKeyMap.get_NonBlockingHashMapLong 5.939 ± 0.009 5.896 ± 0.007 6.370 ± 0.008 WARNING: Use water.nbhm.NonBlockingHashMapLong (from https://github.com/h2oai/h2o-3) instead of org.cliffc.high_scale_lib.NonBlockingHashMapLong (from http://sourceforge.net/projects/high-scale-lib/). The latter is not maintained and has bugs!
  29. Story 2: Get type info faster 29 Benchmark 1.7.0_80 1.8.0_74

    9-ea+108 LongKeyMap.get_ConcurrentHashMap 13.737 ± 0.067 7.464 ± 0.037 8.004 ± 0.050 LongKeyMap.get_NonBlockingHashMapLong 5.939 ± 0.009 5.896 ± 0.007 6.370 ± 0.008 WARNING: Use water.nbhm.NonBlockingHashMapLong (from https://github.com/h2oai/h2o-3) instead of org.cliffc.high_scale_lib.NonBlockingHashMapLong (from http://sourceforge.net/projects/high-scale-lib/). The latter is not maintained and has bugs! UPDATE: NBHM and friends will soon move to JCTools project (watch JCTools issue 87 and JCTools issue 88)!
  30. 30 Story 3: Get property by name ignore case •

    TreeMap(String.CASE_INSENSITIVE_ORDER) • HashMap + String.toLowerCase() on a key • Wrapper around HashMap
  31. 31 Story 3: Get property by name ignore case public

    class MapGet { @Benchmark public Integer HashMap_toLowerCase() { return hashMap.get(key.toLowerCase()); } @Benchmark public Integer TreeMap() { return treeMap.get(key); } @Benchmark public Integer IgnoreCaseMap() { Integer val = ignoreCaseMap.get(key); if (val == null) { val = ignoreCaseMap.get(key.toLowerCase()); if (val != null) { ignoreCaseMap.put(key, val); } } return val; } }
  32. Story 3: Get property by name ignore case 32 Benchmark

    1.7.0_80 1.8.0_74 9-ea+108 MapGet.HashMap_toLowerCase 61.571 ± 0.180 58.809 ± 0.118 48.846 ± 0.523 MapGet.TreeMap 90.545 ± 0.465 79.836 ± 0.372 73.915 ± 0.350 MapGet.IgnoreCaseMap 12.587 ± 0.043 12.556 ± 0.027 12.299 ± 0.045
  33. 33 Story 4: Accessing reflective information public class GetFields {

    private Class<?> klass; private ConcurrentMap<Class<?>, Field[]> fieldsCache; @Benchmark public Field[] getDeclaredFields() { return klass.getDeclaredFields(); } @Benchmark public Field[] getDeclaredFields_cached() { Field[] result = fieldsCache.get(klass); if (result == null) { result = klass.getDeclaredFields(); Field[] tmp = fieldsCache.putIfAbsent(klass, result); if (tmp != null) { return tmp; } } return result; } }
  34. 34 Story 4: Accessing reflective information Benchmark 1.7.0_80 1.8.0_74 9-ea+108

    GetFields.getDeclaredFields 272.256 ± 3.424 60.627 ± 0.239 62.320 ± 0.213 GetFields.getDeclaredFields_cached 11.312 ± 0.049 6.990 ± 0.008 7.206 ± 0.039
  35. 35 Story 5: Creating instances public class CreateNewInstance { @Benchmark

    public Object load_class_find_constructor_allocate() throws Exception { Class<?> clazz = Class.forName(className); Constructor<?> ctr = clazz.getConstructor(Date.class, String.class); return ctr.newInstance(arg1, arg2); } @Benchmark public Object find_constructor_allocate() throws Exception { Constructor<?> ctr = klass.getConstructor(Date.class, String.class); return ctr.newInstance(arg1, arg2); } @Benchmark public Object conctructor_allocate() throws Exception { return contructor.newInstance(arg1, arg2); } @Benchmark public Object new_allocate() throws Exception { return new C(arg1, arg2); } }
  36. 36 Story 5: Creating instances Benchmark 1.7.0_80 1.8.0_74 9-ea+108 CreateNewInstance.load_class_find_constructor_allocat

    e 1577.633 ± 15.640 783.205 ± 28.265 808.638 ± 10.804 CreateNewInstance.find_constructor_allocate 599.053 ± 6.375 86.418 ± 0.360 104.467 ± 1.213 CreateNewInstance.conctructor_allocate 238.814 ± 5.166 19.328 ± 0.087 20.996 ± 0.118 CreateNewInstance.new_allocate 13.637 ± 0.028 13.131 ± 0.040 12.719 ± 0.036
  37. Story 6: Collection to List 37 public class CollectionToList {

    @Setup public void setUp() { … setup 10K data collection … } @Benchmark public ArrayList<Object> new_ArrayList_from_Collection() { return new ArrayList<>(data); } @Benchmark public ArrayList<Object> new_ArrayList_addAll() { ArrayList<Object> result = new ArrayList<>(data.size()); result.addAll(data); return result; } @Benchmark public ArrayList<Object> new_ArrayList_add() { ArrayList<Object> result = new ArrayList<>(data.size()); for (Object o : data) { result.add(o); } return result; } }
  38. ArrayList Benchmark 1.7.0_80 1.8.0_74 9-ea+108 CollectionToList.new_ArrayList_add 45.803 ± 0.149 55.164

    ± 0.678 78.195 ± 0.548 CollectionToList.new_ArrayList_addAll 8.366 ± 0.021 8.386 ± 0.030 8.287 ± 0.014 CollectionToList.new_ArrayList_from_Collection 4.310 ± 0.006 4.159 ± 0.019 4.143 ± 0.006 Story 6: Collection to List 38
  39. Story 6: Collection to List 39 HashSet Benchmark 1.7.0_80 1.8.0_74

    9-ea+108 CollectionToList.new_ArrayList_add 128.984 ± 0.770 117.425 ± 0.417 197.154 ± 0.989 CollectionToList.new_ArrayList_addAll 108.706 ± 1.976 101.570 ± 1.651 174.148 ± 0.989 CollectionToList.new_ArrayList_from_Collection 116.028 ± 0.461 100.884 ± 0.667 168.948 ± 0.699
  40. Story 6: Collection to List 39 HINT: Check out blog

    post Arrays of Wisdom of the Ancients by Aleksey Shipilëv. HashSet Benchmark 1.7.0_80 1.8.0_74 9-ea+108 CollectionToList.new_ArrayList_add 128.984 ± 0.770 117.425 ± 0.417 197.154 ± 0.989 CollectionToList.new_ArrayList_addAll 108.706 ± 1.976 101.570 ± 1.651 174.148 ± 0.989 CollectionToList.new_ArrayList_from_Collection 116.028 ± 0.461 100.884 ± 0.667 168.948 ± 0.699
  41. Story 7: API design - Batching is king 40 startTransaction();

    try { for(DomainObject object : list) { manager.writeObject(object); } commitTransaction(); } catch(RuntimeException e) { rollbackTransaction(); } void writeObject(DomainObject object)
  42. Story 7: API design - Batching is king 40 startTransaction();

    try { for(DomainObject object : list) { manager.writeObject(object); } commitTransaction(); } catch(RuntimeException e) { rollbackTransaction(); } void writeObject(DomainObject object) void writeObjects(Stream<? extends DomainObject> objects) manager.writeObjects(list.stream());
  43. Story 7: API design - Batching is king 40 startTransaction();

    try { for(DomainObject object : list) { manager.writeObject(object); } commitTransaction(); } catch(RuntimeException e) { rollbackTransaction(); } void writeObject(DomainObject object) void writeObjects(Stream<? extends DomainObject> objects) manager.writeObjects(list.stream()); Brings 5x performance improvement using JDBC batching.
  44. Story 8: N+1 SQL we meet again • Originally master/detail

    objects loaded separately Single SELECT to load masters + SELECT per master object to load details for that master 41
  45. Story 8: N+1 SQL we meet again • Originally master/detail

    objects loaded separately Single SELECT to load masters + SELECT per master object to load details for that master 41 • Now master/detail objects loaded together Single SELECT to load all masters + single SELECT to load all details for all masters
  46. 42 Where we stand • Build time down from 3

    hours to 45 minutes • At least one big performance regression caught • On average 10-15% improvements across all use-cases. Specific use cases get 2-3x performance gains out of the box • Using new API brings even more gains (up to 5-6x times improvements)
  47. 43 References • Slides: https://speakerdeck.com/vyazelenko/using-jmh-in-a-real-world-project-javaland-2016 • Benchmarks: https://github.com/vyazelenko/jmh_talk_JavaLand_2016 • JMH:

    http://openjdk.java.net/projects/code-tools/jmh/ • JMH samples: http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/ org/openjdk/jmh/samples/ • Collection of JMH resources by Nitsan Wakart (@nitsanw) • Aleksey Shipilëv’s (@shipilev) blog: shipilev.net/ • JMH Gradle plugin: https://github.com/melix/jmh-gradle-plugin