Upgrade to Pro — share decks privately, control downloads, hide ads and more …

On Value Types or Why Reference Locality Matters

Ladislav Thon
February 06, 2016
33

On Value Types or Why Reference Locality Matters

The Valhalla project in OpenJDK has been exploring adding value types to Java and the JVM. This will hopefully come to fruition in one of the upcoming versions of the Java platform (though definitely not 9). In this talk, I will describe what value types and generic specialization are and show them live using the Valhalla prototype. I will also briefly touch another related topic, ObjectLayout.org. During these explanations, I will repeatedly stress the importance of reference locality to application performance and illustrate the difference using a couple of small JMH benchmarks.

Ladislav Thon

February 06, 2016
Tweet

Transcript

  1. One of the barriers to doing the natural implementation of

    (for example) complex numbers as classes is that class objects are less efficient than primitive types like double and int. James Gosling: The Evolution of Numerical Computing in Java, 1997 https://web.archive.org/web/20031203204137/http://java.sun.com/people/jag/FP.html
  2. ArrayList<Complex> complexNumbers; @Benchmark public double sum() { double re =

    0; double im = 0; for (int i = 0; i < size; i++) { Complex complexNumber = complexNumbers.get(i); re += complexNumber.re; im += complexNumber.im; } return re + im; }
  3. Complex[] complexNumbers; @Benchmark public double sum() { double re =

    0; double im = 0; for (int i = 0; i < size; i++) { Complex complexNumber = complexNumbers[i]; re += complexNumber.re; im += complexNumber.im; } return re + im; }
  4. // even: real part, odd: imaginary part double[] complexNumbers; @Benchmark

    public double sum() { double re = 0; double im = 0; for (int i = 0; i < size; i += 2) { re += complexNumbers[i].re; im += complexNumbers[i + 1].im; } return re + im; }
  5. .----------------. | complexNumbers | ArrayList<Complex> '----------------' | .------. .----.----.----.----.----.-----. '-->|

    List |--->| A | r | r | a | y | ... | '------' '----'----'----'----'----'-----' | | | | | | v | v | | .----. | .----. | | | C2 | | | C4 | | v '----' v '----' v .----. .----. .----. | C1 | | C3 | | C5 | '----' '----' '----'
  6. .----------------. | complexNumbers | Complex[] '----------------' | .----.----.----.----.----.-----. '-->| A

    | r | r | a | y | ... | '----'----'----'----'----'-----' | | | | | | v | v | | .----. | .----. | | | C2 | | | C4 | | v '----' v '----' v .----. .----. .----. | C1 | | C3 | | C5 | '----' '----' '----'
  7. .----------------. | complexNumbers | double[] '----------------' | .-------.-------.-------.-------.-------.-----. '-->| C1.re

    | C1.im | C2.re | C2.im | C3.re | ... | '-------'-------'-------'-------'-------'-----' Array of structs in C
  8. int[] numbers; // varying sizes @Benchmark public void predictableTraversal() {

    int steps = 64 * 1024 * 1024; int lengthMod = numbers.length - 1; for (int i = 0; i < steps; i++) { numbers[(i * 16) & lengthMod]++; } } // if numbers.length is power of 2, // x & lengthMod == x % numbers.length
  9. $ lstopo --output-format txt --no-io --no-legend ┌────────────────────────────────────────┐ │ Machine (7868MB)

    │ │ │ │ ┌────────────────────────────────────┐ │ │ │ Socket P#0 │ │ │ │ │ │ │ │ ┌────────────────────────────────┐ │ │ │ │ │ L3 (4096KB) │ │ │ │ │ └────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ L2 (256KB) │ │ L2 (256KB) │ │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ L1 (32KB) │ │ L1 (32KB) │ │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ Core P#0 │ │ Core P#1 │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │ │ │ PU P#0 │ │ │ │ PU P#2 │ │ │ │ │ │ │ └──────────┘ │ │ └──────────┘ │ │ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │ │ │ PU P#1 │ │ │ │ PU P#3 │ │ │ │ │ │ │ └──────────┘ │ │ └──────────┘ │ │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ └────────────────────────────────────┘ │ └────────────────────────────────────────┘
  10. @Benchmark public void randomTraversal() { int steps = 64 *

    1024 * 1024; int lengthMod = numbers.length - 1; long rnd = System.nanoTime(); // seed int next = 0; for (int i = 0; i < steps; i++) { numbers[next]++; // xorshift* random number generator rnd ^= (rnd << 21); rnd ^= (rnd >>> 35); rnd ^= (rnd << 4); next = (int) (rnd & lengthMod); } }
  11. $ perf stat -d java -version java version "1.6.0_45" Java(TM)

    SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode) Performance counter stats for 'java -version': 45,523871 task-clock (msec) # 1,005 CPUs utilized 98 context-switches # 0,002 M/sec 22 cpu-migrations # 0,483 K/sec 2 888 page-faults # 0,063 M/sec 120 537 883 cycles # 2,648 GHz [51,00%] 70 889 708 stalled-cycles-frontend # 58,81% frontend cycles idle [56,13%] 45 829 284 stalled-cycles-backend # 38,02% backend cycles idle [56,43%] 106 611 221 instructions # 0,88 insns per cycle # 0,66 stalled cycles per insn [65,25%] 16 168 899 branches # 355,174 M/sec [65,19%] 391 063 branch-misses # 2,42% of all branches [55,16%] 29 134 230 L1-dcache-loads # 639,977 M/sec [44,98%] 774 586 L1-dcache-load-misses # 2,66% of all L1-dcache hits [72,20%] 421 404 LLC-loads # 9,257 M/sec [50,77%] <not supported> LLC-load-misses:HG 0,045308292 seconds time elapsed
  12. cz.ladicek.valuetypes.complex.Complex object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12

    (object header) N/A 12 4 (alignment/padding gap) N/A 16 8 double Complex.re N/A 24 8 double Complex.im N/A Instance size: 32 bytes (estimated, the sample instance is not available) Space losses: 4 bytes internal + 0 bytes external = 4 bytes total Overhead of 16 bytes per each Complex
  13. [D object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 16

    (object header) N/A 16 0 double [D.<elements> N/A Instance size: 16 bytes (estimated, the sample instance is not available) Space losses: 0 bytes internal + 0 bytes external = 0 bytes total Overhead of 16 bytes for the entire array
  14. You have to ignore anything related to identity reference equality,

    identity hash, synchronization, serialization, ...
  15. Provide JVM infrastructure for working with immutable and reference-free objects,

    in support of efficient by-value computation with non-primitive types. JEP 169: Value Objects, 2012 http://openjdk.java.net/jeps/169
  16. Extend generic types to support the specialization of generic classes

    and interfaces over primitive types. JEP 218: Generics over Primitive Types, 2014 http://openjdk.java.net/jeps/218
  17. hg clone http://hg.openjdk.java.net/jdk8u/jdk8u cd jdk8u chmod +x get_source.sh chmod +x

    configure ./get_source.sh ./configure --with-boot-jdk=/usr/lib/jvm/java-7-oracle --with-debug-level=slowdebug make clean images ./build/linux-x86_64-normal-server-release/images/j2sdk -image/
  18. hg clone http://hg.openjdk.java.net/valhalla/valhalla cd valhalla chmod +x get_source.sh chmod +x

    configure ./get_source.sh ./configure --with-boot-jdk=/usr/lib/jvm/java-8-oracle --with-debug-level=slowdebug make clean images ./build/linux-x86_64-normal-server-release/images/jdk/
  19. Value types "Codes like a class, works like an int"

    http://cr.openjdk.java.net/~jrose/values/values.html
  20. public final __ByValue class Complex { private final double re,

    im; public Complex(double re, double im) { this.re = re; this.im = im; } public String toString() { return re + " + " + im + "i"; } public static void main(String[] args) { Complex c = __Make Complex(0, 0); System.out.println(c); } }
  21. public final class Box<any T> { private final T value;

    public Box(T value) { this.value = value; } public String toString() { __WhereVal(T) { return "val " + value.toString(); } __WhereRef(T) { return "ref " + value; } } public static void main(String[] args) { System.out.println(new Box<int>(1)); System.out.println(new Box<Integer>(1)); } }
  22. import java.anyutil.List; import java.anyutil.ArrayList; public class Lists { public static

    void main(String[] args) { List<int> l1 = new ArrayList<int>(); List<Integer> l2 = new ArrayList<Integer>(); l1.add(1); l2.add(1); l1.add(2); l2.add(2); l1.add(3); l2.add(3); System.out.println(l1); System.out.println(l2); } }
  23. Just a periodic reminder that performance of all the prototypes

    is going to be pathologically awful for quite a while; attempting to measure performance at this early stage [is] counterproductive. Brian Goetz, 2016-01-29 http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-January/001809.html
  24. The org.ObjectLayout package provides a set of data structure classes

    designed with optimised memory layout in mind. These classes are aimed at matching the natural speed benefits similar constructs enable in most C-style languages [...] http://objectlayout.org/
  25. Random random = new Random(); CtorAndArgsProvider<Complex> p = context ->

    new CtorAndArgs<>( Complex.class, new Class[] {double.class, double.class}, random.nextInt(1000), random.nextInt(1000) ); StructuredArray<Complex> complexNumbers = StructuredArray.newInstance( Complex.class, p, size);