Using JMH in a real world project

1 Using JMH in a real world project by Dmitry
Vyazelenko @ Joker 2015

2 About me • Senior Software Engineer @ Sowatec AG,
Switzerland • Disorganizer at JCrete Unconference, www.jcrete.org • Contacts: vyazelenko.com, @DVyazelenko

3 Agenda • Benchmarking - what’s in it for me?
• Intro to JMH • JMH in Arregulo project • Optimization war stories

Benchmarking - what’s in it for me? 4

Why benchmark? 5 • Assess performance of a software system
or a single component • Compare performance of different algorithms/implementations • Investigate impact of conﬁguration and/or environment changes on a performance of your system

–Dr. Cliff Click “Without exception every microbenchmark I've seen has
had serious ﬂaws” http://www.azulsystems.com/presentations/art-of-java-benchmarking 6

public class ListGetBenchmarkBroken { private static final int ITERATIONS =
1_000_000_000; public static void main(String[] args) { List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } 7

public class ListGetBenchmarkBroken { private static final int ITERATIONS =
1_000_000_000; public static void main(String[] args) { List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); long startTime = System.nanoTime(); for (int i = 0; i < ITERATIONS; i++) { list.get(3); } long endTime = System.nanoTime(); long duration = endTime - startTime; System.out.println("Executed in " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms, " + +((double) duration / ITERATIONS) + " ns/op"); } } Executed in 2ms, 0.002808 ns/op 7

Intro to JMH 9

10 JMH • OpenJDK project: http://openjdk.java.net/projects/ code-tools/jmh/ • Java harness
for building, running, and analyzing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM • De-facto standard for writing benchmarks on JVM

11 JMH features • Benchmark code generation and runtime: Annotation-based
benchmarks Parameters and state Support for multi-threaded benchmarks Blackhole, compiler control etc… • Command line and API for running benchmarks • Built-in proﬁlers

@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public class ListGetBenchmarkJMH { private List<Integer> list;
private int index; @Setup public void setUp() { list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); index = 3; } @Benchmark public Integer get() { return list.get(index); } } 12

Benchmark Score Units ListGetBenchmarkJMH.get 4.244 ns/op @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) @State(Scope.Benchmark) public
class ListGetBenchmarkJMH { private List<Integer> list; private int index; @Setup public void setUp() { list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10); index = 3; } @Benchmark public Integer get() { return list.get(index); } } 12

13 Benchmark modes • Mode.Throughput measures the raw throughput (operations
per unit of time) • Mode.AverageTime measures the average execution time (time per operation) • Mode.SampleTime samples the execution time. Instead of measuring the total time, JMH measure the time spent in *some* of the benchmark method calls (can infer the distributions, percentiles, etc.) • Mode.SingleShotTime measures the single method invocation time

14 Profilers • cl - Classloader profiling via standard MBeans
• comp - JIT compiler profiling via standard MBeans • gc - GC profiling via standard MBeans • stack - Simple and naive Java stack profiler • perf - Linux perf Statistics • perfnorm - Linux perf statistics, normalized by operation count • perfasm - Linux perf + PrintAssembly Profiler • xperfasm - Windows xperf + PrintAssembly Profiler

15 JMH in Arregulo

16 How we approach performance optimization • Identify a use
case for which performance is unsatisfactory • Set a performance goal • Write and validate benchmark(s) for this use case • Figure out what to ﬁx • Apply performance ﬁx and see if results improve • If goal is achieved stop, otherwise keep optimizing if cost is not prohibitive

17 Our setup • Grade build with JMH plugin to
build benchmarks • Custom runner to execute benchmarks • Dedicated machine for performance runs

JMH Gradle plugin 18 plugins { id 'java' id 'com.github.johnrengelman.shadow'
version '1.2.2' id 'me.champeau.gradle.jmh' version '0.2.0' } dependencies { jmh 'ai.h2o:h2o-core:3.2.0.5' } jmhJar { append('META-INF/spring.handlers') append('META-INF/spring.schemas') } jmh { jmhVersion = '1.11.1' }

19 Custom runner public class BenchmarksRunner { public static void
main(String[] args) throws Exception { String appPath = ... Options options = new OptionsBuilder().parent(new CommandLineOptions(args)) .shouldFailOnError(true).timeout(TimeValue.days(1)) .jvmArgs("-D" + APP_PATH_PROPERTY + "=" + appPath, "-Xmx8g") .resultFormat(ResultFormatType.TEXT) .result(appPath + "/benchmark_result_" + getVersion() + ".txt") .build(); new Runner(options).run(); } }

20 Results processing • Currently we store them as plain
text in Conﬂuence • Need to do better. Perhaps use Jenkins JMH plugin (https://github.com/brianfromoregon/jmh-plugin)

21 Issue 1: default timeout • Default iteration timeout is
set 10 minutes • Can be set via -to command line switch or timeout(TimeValue) API call • Includes time spent in @Setup/@TearDown

22 Issue 2: @Setup/@TearDown and threads • JMH does not
provide any guarantees which threads will invoke @Setup/@TearDown • In single-threaded benchmarks it is the same thread but in multi-threaded benchmarks it is undeﬁned

23 Issue 3: fail on error is not on by
default • Fail on error can only be set command line (-foe true) or via API (setFailOnError(true)) • It is not on by default • There is no annotation to set it

24 JMH wish-list • Unified annotations and command line options
(i.e. -gc, -foe and others) • Unified @Threads and @Params • More control/determinism over @Setup/ @TearDown • JFR profiler

25 Optimization war stories

Story 1: Small things matter 26 public Object[] getValues() {
Object[] values = new Object[N]; setValue(values, Property1, getValue1()); setValue(values, Property2, getValue2()); ... return values; } private void setValue(Object[] values, String propName, Object propValue) { TypeDefinition type = getType(); int pos = type.getPropertyPosition(propName); values[pos] = propValue; }

Story 1: Small things matter public Object[] getValues() { TypeDefinition
type = getType(); Object[] values = new Object[N]; setValue(type, values, Property1, getValue1()); setValue(type, values, Property2, getValue2()); ... return values; } private void setValue(TypeDefinition type, Object[] values, String propName, Object propValue) { int pos = type.getPropertyPosition(propName); values[pos] = propValue; } 27

Story 1: Small things matter public Object[] getValues() { TypeDefinition
type = getType(); Object[] values = new Object[N]; setValue(type, values, Property1, getValue1()); setValue(type, values, Property2, getValue2()); ... return values; } private void setValue(TypeDefinition type, Object[] values, String propName, Object propValue) { int pos = type.getPropertyPosition(propName); values[pos] = propValue; } 27 Build time down by 40 minutes (25% of total time)

28 Story 2: Get type info faster

28 Story 2: Get type info faster public class LongKeyMap
{ private ConcurrentHashMap<Long, Double> chm; private NonBlockingHashMapLong<Double> nbhm; private long key; @Setup public void setUp() { // generate 50K data array and populate maps ... } @Benchmark public Double get_ConcurrentHashMap() { return chm.get(key); } @Benchmark public Double get_NonBlockingHashMapLong() { return nbhm.get(key); } }

Story 2: Get type info faster 29 JDK 1.7.0_80 ns/op
LongKeyMap.get_ConcurrentHashMap 14.837 LongKeyMap.get_NonBlockingHashMapLong 6.657

Story 2: Get type info faster JDK 1.8.0_60 ns/op LongKeyMap.get_ConcurrentHashMap
8.343 LongKeyMap.get_NonBlockingHashMapLong 6.508 29 JDK 1.7.0_80 ns/op LongKeyMap.get_ConcurrentHashMap 14.837 LongKeyMap.get_NonBlockingHashMapLong 6.657

Story 2: Get type info faster JDK 1.8.0_60 ns/op LongKeyMap.get_ConcurrentHashMap
8.343 LongKeyMap.get_NonBlockingHashMapLong 6.508 29 JDK 1.7.0_80 ns/op LongKeyMap.get_ConcurrentHashMap 14.837 LongKeyMap.get_NonBlockingHashMapLong 6.657 WARNING: Use water.nbhm.NonBlockingHashMapLong (from https://github.com/h2oai/ h2o-3) instead of org.cliffc.high_scale_lib.NonBlockingHashMapLong (from http://sourceforge.net/projects/high-scale-lib/). The latter is not maintained and has bugs!

30 Story 3: Get property by name ignore case •
TreeMap(String.CASE_INSENSITIVE_ORDER) • HashMap + String.toLowerCase() on a key • Wrapper around HashMap

31 Story 3: Get property by name ignore case public
class MapGet { @Benchmark public Integer HashMap_toLowerCase() { return hashMap.get(key.toLowerCase()); } @Benchmark public Integer TreeMap() { return treeMap.get(key); } @Benchmark public Integer IgnoreCaseMap() { Integer val = ignoreCaseMap.get(key); if (val == null) { val = ignoreCaseMap.get(key.toLowerCase()); if (val != null) { ignoreCaseMap.put(key, val); } } return val; } }

Story 3: Get property by name ignore case 32 JDK
1.7.0_80 ns/op MapGet.TreeMap GoodKey 95.142 MapGet.TreeMap BAD_KEY 105.230 MapGet.HashMap_toLowerCase GoodKey 67.986 MapGet.HashMap_toLowerCase BAD_KEY 69.942 MapGet.IgnoreCaseMap GoodKey 13.083 MapGet.IgnoreCaseMap BAD_KEY 7.583 JDK 1.8.0_60 ns/op MapGet.TreeMap GoodKey 84.018 MapGet.TreeMap BAD_KEY 94.577 MapGet.HashMap_toLowerCase GoodKey 66.122 MapGet.HashMap_toLowerCase BAD_KEY 69.527 MapGet.IgnoreCaseMap GoodKey 13.809 MapGet.IgnoreCaseMap BAD_KEY 7.802

33 Story 4: Accessing reﬂective information

33 Story 4: Accessing reﬂective information public class GetFields {
private Class<?> klass; private ConcurrentMap<Class<?>, Field[]> fieldsCache; @Benchmark public Field[] getDeclaredFields() { return klass.getDeclaredFields(); } @Benchmark public Field[] getDeclaredFields_cached() { Field[] result = fieldsCache.get(klass); if (result == null) { result = klass.getDeclaredFields(); Field[] tmp = fieldsCache.putIfAbsent(klass, result); if (tmp != null) { return tmp; } } return result; } }

34 Story 4: Accessing reﬂective information JDK 1.7.0_80 ns/op GetFields.getDeclaredFields
284.233 GetFields.getDeclaredFields_cached 11.249

34 Story 4: Accessing reﬂective information JDK 1.7.0_80 ns/op GetFields.getDeclaredFields
284.233 GetFields.getDeclaredFields_cached 11.249 JDK 1.8.0_60 ns/op GetFields.getDeclaredFields 67.078 GetFields.getDeclaredFields_cached 7.143

35 Story 5: Creating instances

35 Story 5: Creating instances public class CreateNewInstance { @Benchmark
public Object load_class_find_constructor_allocate() throws Exception { Class<?> clazz = Class.forName(className); Constructor<?> ctr = clazz.getConstructor(Date.class, String.class); return ctr.newInstance(arg1, arg2); } @Benchmark public Object find_constructor_allocate() throws Exception { Constructor<?> ctr = klass.getConstructor(Date.class, String.class); return ctr.newInstance(arg1, arg2); } @Benchmark public Object conctructor_allocate() throws Exception { return contructor.newInstance(arg1, arg2); } @Benchmark public Object new_allocate() throws Exception { return new C(arg1, arg2); } }

36 Story 5: Creating instances JDK 1.7.0_80 ns/op CreateNewInstance.new_allocate 14.453
CreateNewInstance.conctructor_allocate 238.031 CreateNewInstance.find_constructor_allocate 583.546 CreateNewInstance.load_class_find_constructor_allocate 1498.030

36 Story 5: Creating instances JDK 1.7.0_80 ns/op CreateNewInstance.new_allocate 14.453
CreateNewInstance.conctructor_allocate 238.031 CreateNewInstance.find_constructor_allocate 583.546 CreateNewInstance.load_class_find_constructor_allocate 1498.030 JDK 1.8.0_60 ns/op CreateNewInstance.new_allocate 15.462 CreateNewInstance.conctructor_allocate 21.077 CreateNewInstance.find_constructor_allocate 95.389 CreateNewInstance.load_class_find_constructor_allocate 821.017

37 Story 6: Pitfall of “I know better”

37 Story 6: Pitfall of “I know better” public class
CollectionToList { @Benchmark public ArrayList<Object> new_ArrayList_from_Collection() { return new ArrayList<>(data); } @Benchmark public ArrayList<Object> new_ArrayList_addAll() { ArrayList<Object> result = new ArrayList<>(data.size()); result.addAll(data); return result; } @Benchmark public ArrayList<Object> new_ArrayList_add() { ArrayList<Object> result = new ArrayList<>(data.size()); for (Object o : data) { result.add(o); } return result; } }

38 Story 6: Pitfall of “I know better” JDK 1.8.0_60
CollectionToList.new_ArrayList_from_Collection ArrayList 4.242 CollectionToList.new_ArrayList_addAll ArrayList 8.437 CollectionToList.new_ArrayList_add ArrayList 59.440 CollectionToList.new_ArrayList_from_Collection HashSet 120.230 CollectionToList.new_ArrayList_addAll HashSet 106.024 CollectionToList.new_ArrayList_add HashSet 128.250

Story 7: Batching is king 39

Story 7: Batching is king 39 void writeObject(DomainObject object) Users
are forced to implement bulk operations themselves. Don’t have access to low-level features.

Story 7: Batching is king 39 void writeObject(DomainObject object) Users
are forced to implement bulk operations themselves. Don’t have access to low-level features. void writeObjects(List<DomainObject> objects) Framework implementation of bulk operation. Brings 5x performance improvement using JDBC batching.

Story 8: Know your use case 40

• Pivot grid compute in memory • by Product and
Account: 24.5 seconds • by Product: 31.5 seconds Story 8: Know your use case 40

• Pivot grid compute in DB • by Product and
Account: 0.5 seconds • by Product and Account: 4 seconds • Pivot grid compute in memory • by Product and Account: 24.5 seconds • by Product: 31.5 seconds Story 8: Know your use case 40

Story 9: N+1 SQL we meet again • Originally master/detail
objects loaded separately Single SELECT to load masters + SELECT per master object to load details for that master 41

Story 9: N+1 SQL we meet again • Originally master/detail
objects loaded separately Single SELECT to load masters + SELECT per master object to load details for that master 41 • Now master/detail objects loaded together Single SELECT to load all masters + single SELECT to load all details for all masters

42 Where we stand • Build time down from 3
hours to 45 minutes • At least one big performance regression caught • On average 10-15% improvements across all use- cases. Speciﬁc use cases get 2-3x performance gains out of the box • Using new API brings even more gains (up to 5-6x times improvements)

43 References • Slides: https://speakerdeck.com/vyazelenko/using-jmh-in-a-real- world-project • Benchmarks: https://github.com/vyazelenko/jmh_talk_joker2015 •
JMH: http://openjdk.java.net/projects/code-tools/jmh/ • JMH samples: http://hg.openjdk.java.net/code-tools/jmh/ﬁle/tip/ jmh-samples/src/main/java/org/openjdk/jmh/samples/ • JMH Gradle plugin: https://github.com/melix/jmh-gradle-plugin • JMH Jenkins plugin: https://github.com/brianfromoregon/jmh- plugin

44 Questions? @DVyazelenko

Using JMH in a real world project

Using JMH in a real world project

More Decks by Dmitry Vyazelenko

Other Decks in Programming

Featured

Transcript