Upgrade to Pro — share decks privately, control downloads, hide ads and more …

On Value Types or Why Reference Locality Matters

Ladislav Thon
February 06, 2016
24

On Value Types or Why Reference Locality Matters

The Valhalla project in OpenJDK has been exploring adding value types to Java and the JVM. This will hopefully come to fruition in one of the upcoming versions of the Java platform (though definitely not 9). In this talk, I will describe what value types and generic specialization are and show them live using the Valhalla prototype. I will also briefly touch another related topic, ObjectLayout.org. During these explanations, I will repeatedly stress the importance of reference locality to application performance and illustrate the difference using a couple of small JMH benchmarks.

Ladislav Thon

February 06, 2016
Tweet

Transcript

  1. On Value Types or Why Reference Locality Matters ladicek@gmail.com

  2. One of the barriers to doing the natural implementation of

    (for example) complex numbers as classes is that class objects are less efficient than primitive types like double and int. James Gosling: The Evolution of Numerical Computing in Java, 1997 https://web.archive.org/web/20031203204137/http://java.sun.com/people/jag/FP.html
  3. WHY?

  4. public final class Complex { public final double re, im;

    }
  5. ArrayList<Complex> complexNumbers; @Benchmark public double sum() { double re =

    0; double im = 0; for (int i = 0; i < size; i++) { Complex complexNumber = complexNumbers.get(i); re += complexNumber.re; im += complexNumber.im; } return re + im; }
  6. Complex[] complexNumbers; @Benchmark public double sum() { double re =

    0; double im = 0; for (int i = 0; i < size; i++) { Complex complexNumber = complexNumbers[i]; re += complexNumber.re; im += complexNumber.im; } return re + im; }
  7. // even: real part, odd: imaginary part double[] complexNumbers; @Benchmark

    public double sum() { double re = 0; double im = 0; for (int i = 0; i < size; i += 2) { re += complexNumbers[i].re; im += complexNumbers[i + 1].im; } return re + im; }
  8. None
  9. None
  10. .----------------. | complexNumbers | ArrayList<Complex> '----------------' | .------. .----.----.----.----.----.-----. '-->|

    List |--->| A | r | r | a | y | ... | '------' '----'----'----'----'----'-----' | | | | | | v | v | | .----. | .----. | | | C2 | | | C4 | | v '----' v '----' v .----. .----. .----. | C1 | | C3 | | C5 | '----' '----' '----'
  11. .----------------. | complexNumbers | Complex[] '----------------' | .----.----.----.----.----.-----. '-->| A

    | r | r | a | y | ... | '----'----'----'----'----'-----' | | | | | | v | v | | .----. | .----. | | | C2 | | | C4 | | v '----' v '----' v .----. .----. .----. | C1 | | C3 | | C5 | '----' '----' '----'
  12. .----------------. | complexNumbers | double[] '----------------' | .-------.-------.-------.-------.-------.-----. '-->| C1.re

    | C1.im | C2.re | C2.im | C3.re | ... | '-------'-------'-------'-------'-------'-----' Array of structs in C
  13. https://en.wikipedia.org/wiki/Locality_of_reference Among other things... Reference locality!

  14. http://igoro.com/archive/gallery-of-processor-cache-effects/ That is, mostly caching and prefetching

  15. int[] numbers; // varying sizes @Benchmark public void predictableTraversal() {

    int steps = 64 * 1024 * 1024; int lengthMod = numbers.length - 1; for (int i = 0; i < steps; i++) { numbers[(i * 16) & lengthMod]++; } } // if numbers.length is power of 2, // x & lengthMod == x % numbers.length
  16. None
  17. None
  18. None
  19. None
  20. None
  21. None
  22. $ lstopo --output-format txt --no-io --no-legend ┌────────────────────────────────────────┐ │ Machine (7868MB)

    │ │ │ │ ┌────────────────────────────────────┐ │ │ │ Socket P#0 │ │ │ │ │ │ │ │ ┌────────────────────────────────┐ │ │ │ │ │ L3 (4096KB) │ │ │ │ │ └────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ L2 (256KB) │ │ L2 (256KB) │ │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ L1 (32KB) │ │ L1 (32KB) │ │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ Core P#0 │ │ Core P#1 │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │ │ │ PU P#0 │ │ │ │ PU P#2 │ │ │ │ │ │ │ └──────────┘ │ │ └──────────┘ │ │ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │ │ │ PU P#1 │ │ │ │ PU P#3 │ │ │ │ │ │ │ └──────────┘ │ │ └──────────┘ │ │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ └────────────────────────────────────┘ │ └────────────────────────────────────────┘
  23. @Benchmark public void randomTraversal() { int steps = 64 *

    1024 * 1024; int lengthMod = numbers.length - 1; long rnd = System.nanoTime(); // seed int next = 0; for (int i = 0; i < steps; i++) { numbers[next]++; // xorshift* random number generator rnd ^= (rnd << 21); rnd ^= (rnd >>> 35); rnd ^= (rnd << 4); next = (int) (rnd & lengthMod); } }
  24. None
  25. None
  26. $ perf stat -d java -version java version "1.6.0_45" Java(TM)

    SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode) Performance counter stats for 'java -version': 45,523871 task-clock (msec) # 1,005 CPUs utilized 98 context-switches # 0,002 M/sec 22 cpu-migrations # 0,483 K/sec 2 888 page-faults # 0,063 M/sec 120 537 883 cycles # 2,648 GHz [51,00%] 70 889 708 stalled-cycles-frontend # 58,81% frontend cycles idle [56,13%] 45 829 284 stalled-cycles-backend # 38,02% backend cycles idle [56,43%] 106 611 221 instructions # 0,88 insns per cycle # 0,66 stalled cycles per insn [65,25%] 16 168 899 branches # 355,174 M/sec [65,19%] 391 063 branch-misses # 2,42% of all branches [55,16%] 29 134 230 L1-dcache-loads # 639,977 M/sec [44,98%] 774 586 L1-dcache-load-misses # 2,66% of all L1-dcache hits [72,20%] 421 404 LLC-loads # 9,257 M/sec [50,77%] <not supported> LLC-load-misses:HG 0,045308292 seconds time elapsed
  27. None
  28. None
  29. cz.ladicek.valuetypes.complex.Complex object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12

    (object header) N/A 12 4 (alignment/padding gap) N/A 16 8 double Complex.re N/A 24 8 double Complex.im N/A Instance size: 32 bytes (estimated, the sample instance is not available) Space losses: 4 bytes internal + 0 bytes external = 4 bytes total Overhead of 16 bytes per each Complex
  30. [D object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 16

    (object header) N/A 16 0 double [D.<elements> N/A Instance size: 16 bytes (estimated, the sample instance is not available) Space losses: 0 bytes internal + 0 bytes external = 0 bytes total Overhead of 16 bytes for the entire array
  31. immutable class Complex { ... } Proposed in 1997

  32. http://openjdk.java.net/ projects/valhalla/ Maybe in Java 10/11/?

  33. https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html But first: Java 8 introduced value-based classes java.util.Optional, java.time.LocalDateTime,

    ...
  34. They are final, immutable, ... and generally value-like

  35. You have to ignore anything related to identity reference equality,

    identity hash, synchronization, serialization, ...
  36. Provide JVM infrastructure for working with immutable and reference-free objects,

    in support of efficient by-value computation with non-primitive types. JEP 169: Value Objects, 2012 http://openjdk.java.net/jeps/169
  37. Extend generic types to support the specialization of generic classes

    and interfaces over primitive types. JEP 218: Generics over Primitive Types, 2014 http://openjdk.java.net/jeps/218
  38. hg clone http://hg.openjdk.java.net/jdk8u/jdk8u cd jdk8u chmod +x get_source.sh chmod +x

    configure ./get_source.sh ./configure --with-boot-jdk=/usr/lib/jvm/java-7-oracle --with-debug-level=slowdebug make clean images ./build/linux-x86_64-normal-server-release/images/j2sdk -image/
  39. hg clone http://hg.openjdk.java.net/valhalla/valhalla cd valhalla chmod +x get_source.sh chmod +x

    configure ./get_source.sh ./configure --with-boot-jdk=/usr/lib/jvm/java-8-oracle --with-debug-level=slowdebug make clean images ./build/linux-x86_64-normal-server-release/images/jdk/
  40. Please don’t comment on syntax By no means final!

  41. Value types "Codes like a class, works like an int"

    http://cr.openjdk.java.net/~jrose/values/values.html
  42. public final __ByValue class Complex { private final double re,

    im; public Complex(double re, double im) { this.re = re; this.im = im; } public String toString() { return re + " + " + im + "i"; } public static void main(String[] args) { Complex c = __Make Complex(0, 0); System.out.println(c); } }
  43. Also... Generic specialization http://cr.openjdk.java.net/~briangoetz/valhalla/specialization.html

  44. public final class Box<any T> { private final T value;

    public Box(T value) { this.value = value; } public String toString() { __WhereVal(T) { return "val " + value.toString(); } __WhereRef(T) { return "ref " + value; } } public static void main(String[] args) { System.out.println(new Box<int>(1)); System.out.println(new Box<Integer>(1)); } }
  45. import java.anyutil.List; import java.anyutil.ArrayList; public class Lists { public static

    void main(String[] args) { List<int> l1 = new ArrayList<int>(); List<Integer> l2 = new ArrayList<Integer>(); l1.add(1); l2.add(1); l1.add(2); l2.add(2); l1.add(3); l2.add(3); System.out.println(l1); System.out.println(l2); } }
  46. Just a periodic reminder that performance of all the prototypes

    is going to be pathologically awful for quite a while; attempting to measure performance at this early stage [is] counterproductive. Brian Goetz, 2016-01-29 http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-January/001809.html
  47. Slightly related: ObjectLayout.org

  48. The org.ObjectLayout package provides a set of data structure classes

    designed with optimised memory layout in mind. These classes are aimed at matching the natural speed benefits similar constructs enable in most C-style languages [...] http://objectlayout.org/
  49. Random random = new Random(); CtorAndArgsProvider<Complex> p = context ->

    new CtorAndArgs<>( Complex.class, new Class[] {double.class, double.class}, random.nextInt(1000), random.nextInt(1000) ); StructuredArray<Complex> complexNumbers = StructuredArray.newInstance( Complex.class, p, size);
  50. Other resources https://www.youtube.com/watch?v=uNgAFSUXuwc http://blog.codefx.org/java/value-based-classes/ http://blog.codefx.org/java/dev/the-road-to-valhalla/ https://channel9.msdn.com/Events/Build/2014/2-661

  51. Q’s? A’s http://www.devconf.cz/f/326 ladicek@gmail.com