Using JMH in a real world project

Slide 1

Slide 1 text

1 Using JMH in a real world project by Dmitry Vyazelenko @ Joker 2015

Slide 2

Slide 2 text

2 About me • Senior Software Engineer @ Sowatec AG, Switzerland • Disorganizer at JCrete Unconference, www.jcrete.org • Contacts: vyazelenko.com, @DVyazelenko

Slide 3

Slide 3 text

3 Agenda • Benchmarking - what’s in it for me? • Intro to JMH • JMH in Arregulo project • Optimization war stories

Slide 4

Slide 4 text

Benchmarking - what’s in it for me? 4

Slide 5

Slide 5 text

Why benchmark? 5 • Assess performance of a software system or a single component • Compare performance of different algorithms/implementations • Investigate impact of conﬁguration and/or environment changes on a performance of your system

Slide 6

Slide 6 text

–Dr. Cliff Click “Without exception every microbenchmark I've seen has had serious ﬂaws” http://www.azulsystems.com/presentations/art-of-java-benchmarking 6

Slide 7

Slide 7 text

public class ListGetBenchmarkBroken { private static final int ITERATIONS = 1_000_000_000; public static void main(String[] args) { List list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } 7

Slide 8

Slide 8 text

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Intro to JMH 9

Slide 12

Slide 12 text

10 JMH • OpenJDK project: http://openjdk.java.net/projects/ code-tools/jmh/ • Java harness for building, running, and analyzing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM • De-facto standard for writing benchmarks on JVM

Slide 13

Slide 13 text

11 JMH features • Benchmark code generation and runtime: Annotation-based benchmarks Parameters and state Support for multi-threaded benchmarks Blackhole, compiler control etc… • Command line and API for running benchmarks • Built-in proﬁlers

Slide 14

Slide 14 text

@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class ListGetBenchmarkJMH { private List list; private int index; @Setup public void setUp() { list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); index = 3; } @Benchmark public Integer get() { return list.get(index); } } 12

Slide 15

Slide 15 text

Benchmark Score Units ListGetBenchmarkJMH.get 4.244 ns/op @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class ListGetBenchmarkJMH { private List list; private int index; @Setup public void setUp() { list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); index = 3; } @Benchmark public Integer get() { return list.get(index); } } 12

Slide 16

Slide 16 text

13 Benchmark modes • Mode.Throughput measures the raw throughput (operations per unit of time) • Mode.AverageTime measures the average execution time (time per operation) • Mode.SampleTime samples the execution time. Instead of measuring the total time, JMH measure the time spent in *some* of the benchmark method calls (can infer the distributions, percentiles, etc.) • Mode.SingleShotTime measures the single method invocation time

Slide 17

Slide 17 text

14 Profilers • cl - Classloader profiling via standard MBeans • comp - JIT compiler profiling via standard MBeans • gc - GC profiling via standard MBeans • stack - Simple and naive Java stack profiler • perf - Linux perf Statistics • perfnorm - Linux perf statistics, normalized by operation count • perfasm - Linux perf + PrintAssembly Profiler • xperfasm - Windows xperf + PrintAssembly Profiler

Slide 18

Slide 18 text

15 JMH in Arregulo

Slide 19

Slide 19 text

16 How we approach performance optimization • Identify a use case for which performance is unsatisfactory • Set a performance goal • Write and validate benchmark(s) for this use case • Figure out what to ﬁx • Apply performance ﬁx and see if results improve • If goal is achieved stop, otherwise keep optimizing if cost is not prohibitive

Slide 20

Slide 20 text

17 Our setup • Grade build with JMH plugin to build benchmarks • Custom runner to execute benchmarks • Dedicated machine for performance runs

Slide 21

Slide 21 text

JMH Gradle plugin 18 plugins { id 'java' id 'com.github.johnrengelman.shadow' version '1.2.2' id 'me.champeau.gradle.jmh' version '0.2.0' } dependencies { jmh 'ai.h2o:h2o-core:3.2.0.5' } jmhJar { append('META-INF/spring.handlers') append('META-INF/spring.schemas') } jmh { jmhVersion = '1.11.1' }

Slide 22

Slide 22 text

19 Custom runner public class BenchmarksRunner { public static void main(String[] args) throws Exception { String appPath = ... Options options = new OptionsBuilder().parent(new CommandLineOptions(args)) .shouldFailOnError(true).timeout(TimeValue.days(1)) .jvmArgs("-D" + APP_PATH_PROPERTY + "=" + appPath, "-Xmx8g") .resultFormat(ResultFormatType.TEXT) .result(appPath + "/benchmark_result_" + getVersion() + ".txt") .build(); new Runner(options).run(); } }

Slide 23

Slide 23 text

20 Results processing • Currently we store them as plain text in Conﬂuence • Need to do better. Perhaps use Jenkins JMH plugin (https://github.com/brianfromoregon/jmh-plugin)

Slide 24

Slide 24 text

21 Issue 1: default timeout • Default iteration timeout is set 10 minutes • Can be set via -to command line switch or timeout(TimeValue) API call • Includes time spent in @Setup/@TearDown

Slide 25

Slide 25 text

22 Issue 2: @Setup/@TearDown and threads • JMH does not provide any guarantees which threads will invoke @Setup/@TearDown • In single-threaded benchmarks it is the same thread but in multi-threaded benchmarks it is undeﬁned

Slide 26

Slide 26 text

23 Issue 3: fail on error is not on by default • Fail on error can only be set command line (-foe true) or via API (setFailOnError(true)) • It is not on by default • There is no annotation to set it

Slide 27

Slide 27 text

24 JMH wish-list • Unified annotations and command line options (i.e. -gc, -foe and others) • Unified @Threads and @Params • More control/determinism over @Setup/ @TearDown • JFR profiler

Slide 28

Slide 28 text

25 Optimization war stories

Slide 29

Slide 29 text

Story 1: Small things matter 26 public Object[] getValues() { Object[] values = new Object[N]; setValue(values, Property1, getValue1()); setValue(values, Property2, getValue2()); ... return values; } private void setValue(Object[] values, String propName, Object propValue) { TypeDefinition type = getType(); int pos = type.getPropertyPosition(propName); values[pos] = propValue; }

Slide 30

Slide 30 text

Slide 31

Slide 31 text

Story 1: Small things matter public Object[] getValues() { TypeDefinition type = getType(); Object[] values = new Object[N]; setValue(type, values, Property1, getValue1()); setValue(type, values, Property2, getValue2()); ... return values; } private void setValue(TypeDefinition type, Object[] values, String propName, Object propValue) { int pos = type.getPropertyPosition(propName); values[pos] = propValue; } 27

Slide 32

Slide 32 text

Slide 33

Slide 33 text

Slide 34

Slide 34 text

28 Story 2: Get type info faster

Slide 35

Slide 35 text

28 Story 2: Get type info faster public class LongKeyMap { private ConcurrentHashMap chm; private NonBlockingHashMapLong nbhm; private long key; @Setup public void setUp() { // generate 50K data array and populate maps ... } @Benchmark public Double get_ConcurrentHashMap() { return chm.get(key); } @Benchmark public Double get_NonBlockingHashMapLong() { return nbhm.get(key); } }

Slide 36

Slide 36 text

Story 2: Get type info faster 29 JDK 1.7.0_80 ns/op LongKeyMap.get_ConcurrentHashMap 14.837 LongKeyMap.get_NonBlockingHashMapLong 6.657

Slide 37

Slide 37 text

Slide 38

Slide 38 text

Story 2: Get type info faster JDK 1.8.0_60 ns/op LongKeyMap.get_ConcurrentHashMap 8.343 LongKeyMap.get_NonBlockingHashMapLong 6.508 29 JDK 1.7.0_80 ns/op LongKeyMap.get_ConcurrentHashMap 14.837 LongKeyMap.get_NonBlockingHashMapLong 6.657 WARNING: Use water.nbhm.NonBlockingHashMapLong (from https://github.com/h2oai/ h2o-3) instead of org.cliffc.high_scale_lib.NonBlockingHashMapLong (from http://sourceforge.net/projects/high-scale-lib/). The latter is not maintained and has bugs!

Slide 39

Slide 39 text

30 Story 3: Get property by name ignore case • TreeMap(String.CASE_INSENSITIVE_ORDER) • HashMap + String.toLowerCase() on a key • Wrapper around HashMap

Slide 40

Slide 40 text

31 Story 3: Get property by name ignore case public class MapGet { @Benchmark public Integer HashMap_toLowerCase() { return hashMap.get(key.toLowerCase()); } @Benchmark public Integer TreeMap() { return treeMap.get(key); } @Benchmark public Integer IgnoreCaseMap() { Integer val = ignoreCaseMap.get(key); if (val == null) { val = ignoreCaseMap.get(key.toLowerCase()); if (val != null) { ignoreCaseMap.put(key, val); } } return val; } }

Slide 41

Slide 41 text

Story 3: Get property by name ignore case 32 JDK 1.7.0_80 ns/op MapGet.TreeMap GoodKey 95.142 MapGet.TreeMap BAD_KEY 105.230 MapGet.HashMap_toLowerCase GoodKey 67.986 MapGet.HashMap_toLowerCase BAD_KEY 69.942 MapGet.IgnoreCaseMap GoodKey 13.083 MapGet.IgnoreCaseMap BAD_KEY 7.583 JDK 1.8.0_60 ns/op MapGet.TreeMap GoodKey 84.018 MapGet.TreeMap BAD_KEY 94.577 MapGet.HashMap_toLowerCase GoodKey 66.122 MapGet.HashMap_toLowerCase BAD_KEY 69.527 MapGet.IgnoreCaseMap GoodKey 13.809 MapGet.IgnoreCaseMap BAD_KEY 7.802

Slide 42

Slide 42 text

33 Story 4: Accessing reﬂective information

Slide 43

Slide 43 text

33 Story 4: Accessing reﬂective information public class GetFields { private Class klass; private ConcurrentMap, Field[]> fieldsCache; @Benchmark public Field[] getDeclaredFields() { return klass.getDeclaredFields(); } @Benchmark public Field[] getDeclaredFields_cached() { Field[] result = fieldsCache.get(klass); if (result == null) { result = klass.getDeclaredFields(); Field[] tmp = fieldsCache.putIfAbsent(klass, result); if (tmp != null) { return tmp; } } return result; } }

Slide 44

Slide 44 text

34 Story 4: Accessing reﬂective information JDK 1.7.0_80 ns/op GetFields.getDeclaredFields 284.233 GetFields.getDeclaredFields_cached 11.249

Slide 45

Slide 45 text

34 Story 4: Accessing reﬂective information JDK 1.7.0_80 ns/op GetFields.getDeclaredFields 284.233 GetFields.getDeclaredFields_cached 11.249 JDK 1.8.0_60 ns/op GetFields.getDeclaredFields 67.078 GetFields.getDeclaredFields_cached 7.143

Slide 46

Slide 46 text

35 Story 5: Creating instances

Slide 47

Slide 47 text

35 Story 5: Creating instances public class CreateNewInstance { @Benchmark public Object load_class_find_constructor_allocate() throws Exception { Class clazz = Class.forName(className); Constructor ctr = clazz.getConstructor(Date.class, String.class); return ctr.newInstance(arg1, arg2); } @Benchmark public Object find_constructor_allocate() throws Exception { Constructor ctr = klass.getConstructor(Date.class, String.class); return ctr.newInstance(arg1, arg2); } @Benchmark public Object conctructor_allocate() throws Exception { return contructor.newInstance(arg1, arg2); } @Benchmark public Object new_allocate() throws Exception { return new C(arg1, arg2); } }

Slide 48

Slide 48 text

Slide 49

Slide 49 text

36 Story 5: Creating instances JDK 1.7.0_80 ns/op CreateNewInstance.new_allocate 14.453 CreateNewInstance.conctructor_allocate 238.031 CreateNewInstance.find_constructor_allocate 583.546 CreateNewInstance.load_class_find_constructor_allocate 1498.030 JDK 1.8.0_60 ns/op CreateNewInstance.new_allocate 15.462 CreateNewInstance.conctructor_allocate 21.077 CreateNewInstance.find_constructor_allocate 95.389 CreateNewInstance.load_class_find_constructor_allocate 821.017

Slide 50

Slide 50 text

37 Story 6: Pitfall of “I know better”

Slide 51

Slide 51 text

37 Story 6: Pitfall of “I know better” public class CollectionToList { @Benchmark public ArrayList new_ArrayList_from_Collection() { return new ArrayList<>(data); } @Benchmark public ArrayList new_ArrayList_addAll() { ArrayList result = new ArrayList<>(data.size()); result.addAll(data); return result; } @Benchmark public ArrayList new_ArrayList_add() { ArrayList result = new ArrayList<>(data.size()); for (Object o : data) { result.add(o); } return result; } }

Slide 52

Slide 52 text

38 Story 6: Pitfall of “I know better” JDK 1.8.0_60 CollectionToList.new_ArrayList_from_Collection ArrayList 4.242 CollectionToList.new_ArrayList_addAll ArrayList 8.437 CollectionToList.new_ArrayList_add ArrayList 59.440 CollectionToList.new_ArrayList_from_Collection HashSet 120.230 CollectionToList.new_ArrayList_addAll HashSet 106.024 CollectionToList.new_ArrayList_add HashSet 128.250

Slide 53

Slide 53 text

Story 7: Batching is king 39

Slide 54

Slide 54 text

Story 7: Batching is king 39 void writeObject(DomainObject object) Users are forced to implement bulk operations themselves. Don’t have access to low-level features.

Slide 55

Slide 55 text

Story 7: Batching is king 39 void writeObject(DomainObject object) Users are forced to implement bulk operations themselves. Don’t have access to low-level features. void writeObjects(List objects) Framework implementation of bulk operation. Brings 5x performance improvement using JDBC batching.

Slide 56

Slide 56 text

Story 8: Know your use case 40

Slide 57

Slide 57 text

• Pivot grid compute in memory • by Product and Account: 24.5 seconds • by Product: 31.5 seconds Story 8: Know your use case 40

Slide 58

Slide 58 text

• Pivot grid compute in DB • by Product and Account: 0.5 seconds • by Product and Account: 4 seconds • Pivot grid compute in memory • by Product and Account: 24.5 seconds • by Product: 31.5 seconds Story 8: Know your use case 40

Slide 59

Slide 59 text

Slide 60

Slide 60 text

Story 9: N+1 SQL we meet again • Originally master/detail objects loaded separately Single SELECT to load masters + SELECT per master object to load details for that master 41

Slide 61

Slide 61 text

Story 9: N+1 SQL we meet again • Originally master/detail objects loaded separately Single SELECT to load masters + SELECT per master object to load details for that master 41 • Now master/detail objects loaded together Single SELECT to load all masters + single SELECT to load all details for all masters

Slide 62

Slide 62 text

42 Where we stand • Build time down from 3 hours to 45 minutes • At least one big performance regression caught • On average 10-15% improvements across all use- cases. Speciﬁc use cases get 2-3x performance gains out of the box • Using new API brings even more gains (up to 5-6x times improvements)

Slide 63

Slide 63 text

43 References • Slides: https://speakerdeck.com/vyazelenko/using-jmh-in-a-real- world-project • Benchmarks: https://github.com/vyazelenko/jmh_talk_joker2015 • JMH: http://openjdk.java.net/projects/code-tools/jmh/ • JMH samples: http://hg.openjdk.java.net/code-tools/jmh/ﬁle/tip/ jmh-samples/src/main/java/org/openjdk/jmh/samples/ • JMH Gradle plugin: https://github.com/melix/jmh-gradle-plugin • JMH Jenkins plugin: https://github.com/brianfromoregon/jmh- plugin

Slide 64

Slide 64 text

44 Questions? @DVyazelenko