Slide 1

Slide 1 text

1 Using JMH in a real world project by Dmitry Vyazelenko @ JavaLand 2016

Slide 2

Slide 2 text

2 About me • Senior Software Engineer @ Sowatec AG, Switzerland • Disorganizer at JCrete Unconference, www.jcrete.org • Contacts: vyazelenko.com, @DVyazelenko

Slide 3

Slide 3 text

3 Agenda • Benchmarking - what’s in it for me? • Intro to JMH • JMH in Arregulo project • Optimization war stories

Slide 4

Slide 4 text

Benchmarking - what’s in it for me? 4

Slide 5

Slide 5 text

Why benchmark? 5 • Assess performance of a software system or a single component • Compare performance of different algorithms/implementations • Investigate impact of configuration and/or environment changes on a performance of your system

Slide 6

Slide 6 text

–Dr. Cliff Click “Without exception every microbenchmark I've seen has had serious flaws” http://www.azulsystems.com/presentations/art-of-java-benchmarking 6

Slide 7

Slide 7 text

public class ListGetBenchmarkBroken { private static final int ITERATIONS = 1_000_000_000; public static void main(String[] args) { List list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } 7

Slide 8

Slide 8 text

public class ListGetBenchmarkBroken { private static final int ITERATIONS = 1_000_000_000; public static void main(String[] args) { List list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } Executed in 6ms, 0.006736898 ns/op 7

Slide 9

Slide 9 text

public class ListGetBenchmarkBroken { private static final int ITERATIONS = 1_000_000_000; public static void main(String[] args) { List list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } Executed in 6ms, 0.006736898 ns/op 7

Slide 10

Slide 10 text

8

Slide 11

Slide 11 text

Intro to JMH 9

Slide 12

Slide 12 text

10 JMH • OpenJDK project: http://openjdk.java.net/projects/code-tools/jmh/ • Java harness for building, running, and analyzing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM • De-facto standard for writing benchmarks on JVM • JEP 230: Microbenchmark Suite (Status: Proposed to Drop)

Slide 13

Slide 13 text

11 JMH features • Benchmark code generation and runtime: Annotation-based benchmarks Parameters and state Support for multi-threaded benchmarks Blackhole, compiler control etc… • Command line and API for running benchmarks • Built-in profilers

Slide 14

Slide 14 text

@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class ListGetBenchmarkJMH { private List list; private int index; @Setup public void setUp() { list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); index = 3; } @Benchmark
 public void baseline() {
 } @Benchmark public Integer get() { return list.get(index); } } 12

Slide 15

Slide 15 text

Benchmark Mode Cnt Score Error Units ListGetBenchmarkJMH.baseline avgt 200 0.282 ± 0.003 ns/op ListGetBenchmarkJMH.get avgt 200 3.985 ± 0.024 ns/op @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class ListGetBenchmarkJMH { private List list; private int index; @Setup public void setUp() { list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); index = 3; } @Benchmark
 public void baseline() {
 } @Benchmark public Integer get() { return list.get(index); } } 12

Slide 16

Slide 16 text

13 https://jaxenter.de/aus-der-java-trickkiste-microbenchmarking-24155

Slide 17

Slide 17 text

13 https://jaxenter.de/aus-der-java-trickkiste-microbenchmarking-24155 Prone to Dead-Code Elimination (DCE)

Slide 18

Slide 18 text

14 Benchmark modes • Mode.Throughput - measures the raw throughput (operations per unit of time) • Mode.AverageTime - measures the average execution time (time per operation) • Mode.SampleTime - samples the execution time. Instead of measuring the total time, JMH measure the time spent in *some* of the benchmark method calls (can infer the distributions, percentiles, etc.) • Mode.SingleShotTime - measures the single method invocation time

Slide 19

Slide 19 text

15 Profilers • cl - Classloader profiling via standard MBeans • comp - JIT compiler profiling via standard MBeans • gc - GC profiling via standard MBeans • stack - Simple and naive Java stack profiler • perf - Linux perf Statistics • perfnorm - Linux perf statistics, normalized by operation count • perfasm - Linux perf + PrintAssembly Profiler • xperfasm - Windows xperf + PrintAssembly Profiler

Slide 20

Slide 20 text

16 JMH in Arregulo

Slide 21

Slide 21 text

17 How we (should) approach performance optimizations • Identify a use case for which performance is unsatisfactory • Set a performance goal • Write and validate benchmark(s) for this use case • Figure out what to fix • Apply performance fix and see if results improve • If goal is achieved stop, otherwise keep optimizing if cost is not prohibitive

Slide 22

Slide 22 text

18 Our current setup • Grade build with JMH plugin to build benchmarks • Dedicated machine for performance runs

Slide 23

Slide 23 text

Gradle build 19 apply plugin: "com.github.johnrengelman.shadow" apply plugin: "me.champeau.gradle.jmh" jmhJar { append("META-INF/spring.handlers") append("META-INF/spring.schemas") } jmh { jmhVersion = "1.11.3" resultFormat = "TEXT" resultsFile = file("${buildDir}/reports/jmh/benchmark_result_${project.version}.txt") humanOutputFile = file("${buildDir}/reports/jmh/output_${project.version}.txt") failOnError = true timeout = "1 day" jvmArgs = "-Xmx8g -XX:+PrintCommandLineFlags -Darregulo.path=${projectDir}" }

Slide 24

Slide 24 text

20 Results processing Currently we store them as plain text in Confluence

Slide 25

Slide 25 text

20 Results processing Currently we store them as plain text in Confluence

Slide 26

Slide 26 text

21 Issue 1: default timeout • Default iteration timeout is set 10 minutes • Can be set via -to command line switch or timeout(TimveValue) API call • Includes time spent in @Setup/@TearDown

Slide 27

Slide 27 text

22 Issue 2: @Setup/@TearDown and threads • JMH does not provide any guarantees which thread will invoke @Setup/@TearDown methods • In single-threaded benchmarks it is always the same thread but in multi-threaded benchmarks it is undefined

Slide 28

Slide 28 text

23 Issue 3: fail on error is not on by default • Fail on error can only be set command line (-foe true) or via API (setFailOnError(true)) • It is not on by default • There is no annotation to set it

Slide 29

Slide 29 text

24 JMH wish-list • Unified annotations and command line options (i.e. -gc, -foe and others) • Unified @Threads and @Params • More control/determinism over @Setup/@TearDown • JFR profiler

Slide 30

Slide 30 text

25 Optimization war stories

Slide 31

Slide 31 text

Story 1: O(N) algorithm 26 public Object[] getValues() { Object[] values = new Object[N]; setValue(values, Property1, getValue1()); setValue(values, Property2, getValue2()); ... return values; } protected final void setValue(Object[] values, String propName, Object propValue) { TypeDefinition type = getType(); int pos = type.getPropertyPosition(propName); values[pos] = propValue; }

Slide 32

Slide 32 text

Story 1: O(N) algorithm 26 public Object[] getValues() { Object[] values = new Object[N]; setValue(values, Property1, getValue1()); setValue(values, Property2, getValue2()); ... return values; } protected final void setValue(Object[] values, String propName, Object propValue) { TypeDefinition type = getType(); int pos = type.getPropertyPosition(propName); values[pos] = propValue; }

Slide 33

Slide 33 text

Story 1: O(N) algorithm public Object[] getValues() { TypeDefinition type = getType(); Object[] values = new Object[N]; setValue(type, values, Property1, getValue1()); setValue(type, values, Property2, getValue2()); ... return values; } protected final void setValue(TypeDefinition type, Object[] values, String propName, Object propValue) { int pos = type.getPropertyPosition(propName); values[pos] = propValue; } 27

Slide 34

Slide 34 text

Story 1: O(N) algorithm public Object[] getValues() { TypeDefinition type = getType(); Object[] values = new Object[N]; setValue(type, values, Property1, getValue1()); setValue(type, values, Property2, getValue2()); ... return values; } protected final void setValue(TypeDefinition type, Object[] values, String propName, Object propValue) { int pos = type.getPropertyPosition(propName); values[pos] = propValue; } 27 Build time down by 40 minutes (25% of total time)

Slide 35

Slide 35 text

28 Story 2: Get type info faster

Slide 36

Slide 36 text

28 Story 2: Get type info faster public class LongKeyMap { private ConcurrentHashMap chm; private NonBlockingHashMapLong nbhm; private long key; @Setup public void setUp() { // generate 50K data array and populate maps ... } @Benchmark public Double get_ConcurrentHashMap() { return chm.get(key); } @Benchmark public Double get_NonBlockingHashMapLong() { return nbhm.get(key); } }

Slide 37

Slide 37 text

Story 2: Get type info faster 29 Benchmark 1.7.0_80 1.8.0_74 9-ea+108 LongKeyMap.get_ConcurrentHashMap 13.737 ± 0.067 7.464 ± 0.037 8.004 ± 0.050 LongKeyMap.get_NonBlockingHashMapLong 5.939 ± 0.009 5.896 ± 0.007 6.370 ± 0.008

Slide 38

Slide 38 text

Story 2: Get type info faster 29 Benchmark 1.7.0_80 1.8.0_74 9-ea+108 LongKeyMap.get_ConcurrentHashMap 13.737 ± 0.067 7.464 ± 0.037 8.004 ± 0.050 LongKeyMap.get_NonBlockingHashMapLong 5.939 ± 0.009 5.896 ± 0.007 6.370 ± 0.008 WARNING: Use water.nbhm.NonBlockingHashMapLong (from https://github.com/h2oai/h2o-3) instead of org.cliffc.high_scale_lib.NonBlockingHashMapLong (from http://sourceforge.net/projects/high-scale-lib/). The latter is not maintained and has bugs!

Slide 39

Slide 39 text

Story 2: Get type info faster 29 Benchmark 1.7.0_80 1.8.0_74 9-ea+108 LongKeyMap.get_ConcurrentHashMap 13.737 ± 0.067 7.464 ± 0.037 8.004 ± 0.050 LongKeyMap.get_NonBlockingHashMapLong 5.939 ± 0.009 5.896 ± 0.007 6.370 ± 0.008 WARNING: Use water.nbhm.NonBlockingHashMapLong (from https://github.com/h2oai/h2o-3) instead of org.cliffc.high_scale_lib.NonBlockingHashMapLong (from http://sourceforge.net/projects/high-scale-lib/). The latter is not maintained and has bugs! UPDATE: NBHM and friends will soon move to JCTools project (watch JCTools issue 87 and JCTools issue 88)!

Slide 40

Slide 40 text

30 Story 3: Get property by name ignore case • TreeMap(String.CASE_INSENSITIVE_ORDER) • HashMap + String.toLowerCase() on a key • Wrapper around HashMap

Slide 41

Slide 41 text

31 Story 3: Get property by name ignore case public class MapGet { @Benchmark public Integer HashMap_toLowerCase() { return hashMap.get(key.toLowerCase()); } @Benchmark public Integer TreeMap() { return treeMap.get(key); } @Benchmark public Integer IgnoreCaseMap() { Integer val = ignoreCaseMap.get(key); if (val == null) { val = ignoreCaseMap.get(key.toLowerCase()); if (val != null) { ignoreCaseMap.put(key, val); } } return val; } }

Slide 42

Slide 42 text

Story 3: Get property by name ignore case 32 Benchmark 1.7.0_80 1.8.0_74 9-ea+108 MapGet.HashMap_toLowerCase 61.571 ± 0.180 58.809 ± 0.118 48.846 ± 0.523 MapGet.TreeMap 90.545 ± 0.465 79.836 ± 0.372 73.915 ± 0.350 MapGet.IgnoreCaseMap 12.587 ± 0.043 12.556 ± 0.027 12.299 ± 0.045

Slide 43

Slide 43 text

33 Story 4: Accessing reflective information public class GetFields { private Class> klass; private ConcurrentMap, Field[]> fieldsCache; @Benchmark public Field[] getDeclaredFields() { return klass.getDeclaredFields(); } @Benchmark public Field[] getDeclaredFields_cached() { Field[] result = fieldsCache.get(klass); if (result == null) { result = klass.getDeclaredFields(); Field[] tmp = fieldsCache.putIfAbsent(klass, result); if (tmp != null) { return tmp; } } return result; } }

Slide 44

Slide 44 text

34 Story 4: Accessing reflective information Benchmark 1.7.0_80 1.8.0_74 9-ea+108 GetFields.getDeclaredFields 272.256 ± 3.424 60.627 ± 0.239 62.320 ± 0.213 GetFields.getDeclaredFields_cached 11.312 ± 0.049 6.990 ± 0.008 7.206 ± 0.039

Slide 45

Slide 45 text

35 Story 5: Creating instances public class CreateNewInstance { @Benchmark public Object load_class_find_constructor_allocate() throws Exception { Class> clazz = Class.forName(className); Constructor> ctr = clazz.getConstructor(Date.class, String.class); return ctr.newInstance(arg1, arg2); } @Benchmark public Object find_constructor_allocate() throws Exception { Constructor> ctr = klass.getConstructor(Date.class, String.class); return ctr.newInstance(arg1, arg2); } @Benchmark public Object conctructor_allocate() throws Exception { return contructor.newInstance(arg1, arg2); } @Benchmark public Object new_allocate() throws Exception { return new C(arg1, arg2); } }

Slide 46

Slide 46 text

36 Story 5: Creating instances Benchmark 1.7.0_80 1.8.0_74 9-ea+108 CreateNewInstance.load_class_find_constructor_allocat e 1577.633 ± 15.640 783.205 ± 28.265 808.638 ± 10.804 CreateNewInstance.find_constructor_allocate 599.053 ± 6.375 86.418 ± 0.360 104.467 ± 1.213 CreateNewInstance.conctructor_allocate 238.814 ± 5.166 19.328 ± 0.087 20.996 ± 0.118 CreateNewInstance.new_allocate 13.637 ± 0.028 13.131 ± 0.040 12.719 ± 0.036

Slide 47

Slide 47 text

Story 6: Collection to List 37 public class CollectionToList { @Setup public void setUp() { … setup 10K data collection … } @Benchmark public ArrayList new_ArrayList_from_Collection() { return new ArrayList<>(data); } @Benchmark public ArrayList new_ArrayList_addAll() { ArrayList result = new ArrayList<>(data.size()); result.addAll(data); return result; } @Benchmark public ArrayList new_ArrayList_add() { ArrayList result = new ArrayList<>(data.size()); for (Object o : data) { result.add(o); } return result; } }

Slide 48

Slide 48 text

ArrayList Benchmark 1.7.0_80 1.8.0_74 9-ea+108 CollectionToList.new_ArrayList_add 45.803 ± 0.149 55.164 ± 0.678 78.195 ± 0.548 CollectionToList.new_ArrayList_addAll 8.366 ± 0.021 8.386 ± 0.030 8.287 ± 0.014 CollectionToList.new_ArrayList_from_Collection 4.310 ± 0.006 4.159 ± 0.019 4.143 ± 0.006 Story 6: Collection to List 38

Slide 49

Slide 49 text

Story 6: Collection to List 39 HashSet Benchmark 1.7.0_80 1.8.0_74 9-ea+108 CollectionToList.new_ArrayList_add 128.984 ± 0.770 117.425 ± 0.417 197.154 ± 0.989 CollectionToList.new_ArrayList_addAll 108.706 ± 1.976 101.570 ± 1.651 174.148 ± 0.989 CollectionToList.new_ArrayList_from_Collection 116.028 ± 0.461 100.884 ± 0.667 168.948 ± 0.699

Slide 50

Slide 50 text

Story 6: Collection to List 39 HINT: Check out blog post Arrays of Wisdom of the Ancients by Aleksey Shipilëv. HashSet Benchmark 1.7.0_80 1.8.0_74 9-ea+108 CollectionToList.new_ArrayList_add 128.984 ± 0.770 117.425 ± 0.417 197.154 ± 0.989 CollectionToList.new_ArrayList_addAll 108.706 ± 1.976 101.570 ± 1.651 174.148 ± 0.989 CollectionToList.new_ArrayList_from_Collection 116.028 ± 0.461 100.884 ± 0.667 168.948 ± 0.699

Slide 51

Slide 51 text

Story 7: API design - Batching is king 40 startTransaction(); try { for(DomainObject object : list) { manager.writeObject(object); } commitTransaction(); } catch(RuntimeException e) { rollbackTransaction(); } void writeObject(DomainObject object)

Slide 52

Slide 52 text

Story 7: API design - Batching is king 40 startTransaction(); try { for(DomainObject object : list) { manager.writeObject(object); } commitTransaction(); } catch(RuntimeException e) { rollbackTransaction(); } void writeObject(DomainObject object) void writeObjects(Stream extends DomainObject> objects) manager.writeObjects(list.stream());

Slide 53

Slide 53 text

Story 7: API design - Batching is king 40 startTransaction(); try { for(DomainObject object : list) { manager.writeObject(object); } commitTransaction(); } catch(RuntimeException e) { rollbackTransaction(); } void writeObject(DomainObject object) void writeObjects(Stream extends DomainObject> objects) manager.writeObjects(list.stream()); Brings 5x performance improvement using JDBC batching.

Slide 54

Slide 54 text

Story 8: N+1 SQL we meet again • Originally master/detail objects loaded separately Single SELECT to load masters + SELECT per master object to load details for that master 41

Slide 55

Slide 55 text

Story 8: N+1 SQL we meet again • Originally master/detail objects loaded separately Single SELECT to load masters + SELECT per master object to load details for that master 41 • Now master/detail objects loaded together Single SELECT to load all masters + single SELECT to load all details for all masters

Slide 56

Slide 56 text

42 Where we stand • Build time down from 3 hours to 45 minutes • At least one big performance regression caught • On average 10-15% improvements across all use-cases. Specific use cases get 2-3x performance gains out of the box • Using new API brings even more gains (up to 5-6x times improvements)

Slide 57

Slide 57 text

43 References • Slides: https://speakerdeck.com/vyazelenko/using-jmh-in-a-real-world-project-javaland-2016 • Benchmarks: https://github.com/vyazelenko/jmh_talk_JavaLand_2016 • JMH: http://openjdk.java.net/projects/code-tools/jmh/ • JMH samples: http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/ org/openjdk/jmh/samples/ • Collection of JMH resources by Nitsan Wakart (@nitsanw) • Aleksey Shipilëv’s (@shipilev) blog: shipilev.net/ • JMH Gradle plugin: https://github.com/melix/jmh-gradle-plugin

Slide 58

Slide 58 text

44 Questions? @DVyazelenko