Slide 1

Slide 1 text

On Value Types or Why Reference Locality Matters [email protected]

Slide 2

Slide 2 text

One of the barriers to doing the natural implementation of (for example) complex numbers as classes is that class objects are less efficient than primitive types like double and int. James Gosling: The Evolution of Numerical Computing in Java, 1997 https://web.archive.org/web/20031203204137/http://java.sun.com/people/jag/FP.html

Slide 3

Slide 3 text

WHY?

Slide 4

Slide 4 text

public final class Complex { public final double re, im; }

Slide 5

Slide 5 text

ArrayList complexNumbers; @Benchmark public double sum() { double re = 0; double im = 0; for (int i = 0; i < size; i++) { Complex complexNumber = complexNumbers.get(i); re += complexNumber.re; im += complexNumber.im; } return re + im; }

Slide 6

Slide 6 text

Complex[] complexNumbers; @Benchmark public double sum() { double re = 0; double im = 0; for (int i = 0; i < size; i++) { Complex complexNumber = complexNumbers[i]; re += complexNumber.re; im += complexNumber.im; } return re + im; }

Slide 7

Slide 7 text

// even: real part, odd: imaginary part double[] complexNumbers; @Benchmark public double sum() { double re = 0; double im = 0; for (int i = 0; i < size; i += 2) { re += complexNumbers[i].re; im += complexNumbers[i + 1].im; } return re + im; }

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

.----------------. | complexNumbers | ArrayList '----------------' | .------. .----.----.----.----.----.-----. '-->| List |--->| A | r | r | a | y | ... | '------' '----'----'----'----'----'-----' | | | | | | v | v | | .----. | .----. | | | C2 | | | C4 | | v '----' v '----' v .----. .----. .----. | C1 | | C3 | | C5 | '----' '----' '----'

Slide 11

Slide 11 text

.----------------. | complexNumbers | Complex[] '----------------' | .----.----.----.----.----.-----. '-->| A | r | r | a | y | ... | '----'----'----'----'----'-----' | | | | | | v | v | | .----. | .----. | | | C2 | | | C4 | | v '----' v '----' v .----. .----. .----. | C1 | | C3 | | C5 | '----' '----' '----'

Slide 12

Slide 12 text

.----------------. | complexNumbers | double[] '----------------' | .-------.-------.-------.-------.-------.-----. '-->| C1.re | C1.im | C2.re | C2.im | C3.re | ... | '-------'-------'-------'-------'-------'-----' Array of structs in C

Slide 13

Slide 13 text

https://en.wikipedia.org/wiki/Locality_of_reference Among other things... Reference locality!

Slide 14

Slide 14 text

http://igoro.com/archive/gallery-of-processor-cache-effects/ That is, mostly caching and prefetching

Slide 15

Slide 15 text

int[] numbers; // varying sizes @Benchmark public void predictableTraversal() { int steps = 64 * 1024 * 1024; int lengthMod = numbers.length - 1; for (int i = 0; i < steps; i++) { numbers[(i * 16) & lengthMod]++; } } // if numbers.length is power of 2, // x & lengthMod == x % numbers.length

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

$ lstopo --output-format txt --no-io --no-legend ┌────────────────────────────────────────┐ │ Machine (7868MB) │ │ │ │ ┌────────────────────────────────────┐ │ │ │ Socket P#0 │ │ │ │ │ │ │ │ ┌────────────────────────────────┐ │ │ │ │ │ L3 (4096KB) │ │ │ │ │ └────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ L2 (256KB) │ │ L2 (256KB) │ │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ L1 (32KB) │ │ L1 (32KB) │ │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ Core P#0 │ │ Core P#1 │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │ │ │ PU P#0 │ │ │ │ PU P#2 │ │ │ │ │ │ │ └──────────┘ │ │ └──────────┘ │ │ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │ │ │ PU P#1 │ │ │ │ PU P#3 │ │ │ │ │ │ │ └──────────┘ │ │ └──────────┘ │ │ │ │ │ └──────────────┘ └──────────────┘ │ │ │ └────────────────────────────────────┘ │ └────────────────────────────────────────┘

Slide 23

Slide 23 text

@Benchmark public void randomTraversal() { int steps = 64 * 1024 * 1024; int lengthMod = numbers.length - 1; long rnd = System.nanoTime(); // seed int next = 0; for (int i = 0; i < steps; i++) { numbers[next]++; // xorshift* random number generator rnd ^= (rnd << 21); rnd ^= (rnd >>> 35); rnd ^= (rnd << 4); next = (int) (rnd & lengthMod); } }

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

$ perf stat -d java -version java version "1.6.0_45" Java(TM) SE Runtime Environment (build 1.6.0_45-b06) Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode) Performance counter stats for 'java -version': 45,523871 task-clock (msec) # 1,005 CPUs utilized 98 context-switches # 0,002 M/sec 22 cpu-migrations # 0,483 K/sec 2 888 page-faults # 0,063 M/sec 120 537 883 cycles # 2,648 GHz [51,00%] 70 889 708 stalled-cycles-frontend # 58,81% frontend cycles idle [56,13%] 45 829 284 stalled-cycles-backend # 38,02% backend cycles idle [56,43%] 106 611 221 instructions # 0,88 insns per cycle # 0,66 stalled cycles per insn [65,25%] 16 168 899 branches # 355,174 M/sec [65,19%] 391 063 branch-misses # 2,42% of all branches [55,16%] 29 134 230 L1-dcache-loads # 639,977 M/sec [44,98%] 774 586 L1-dcache-load-misses # 2,66% of all L1-dcache hits [72,20%] 421 404 LLC-loads # 9,257 M/sec [50,77%] LLC-load-misses:HG 0,045308292 seconds time elapsed

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

No content

Slide 29

Slide 29 text

cz.ladicek.valuetypes.complex.Complex object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 (alignment/padding gap) N/A 16 8 double Complex.re N/A 24 8 double Complex.im N/A Instance size: 32 bytes (estimated, the sample instance is not available) Space losses: 4 bytes internal + 0 bytes external = 4 bytes total Overhead of 16 bytes per each Complex

Slide 30

Slide 30 text

[D object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 16 (object header) N/A 16 0 double [D. N/A Instance size: 16 bytes (estimated, the sample instance is not available) Space losses: 0 bytes internal + 0 bytes external = 0 bytes total Overhead of 16 bytes for the entire array

Slide 31

Slide 31 text

immutable class Complex { ... } Proposed in 1997

Slide 32

Slide 32 text

http://openjdk.java.net/ projects/valhalla/ Maybe in Java 10/11/?

Slide 33

Slide 33 text

https://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html But first: Java 8 introduced value-based classes java.util.Optional, java.time.LocalDateTime, ...

Slide 34

Slide 34 text

They are final, immutable, ... and generally value-like

Slide 35

Slide 35 text

You have to ignore anything related to identity reference equality, identity hash, synchronization, serialization, ...

Slide 36

Slide 36 text

Provide JVM infrastructure for working with immutable and reference-free objects, in support of efficient by-value computation with non-primitive types. JEP 169: Value Objects, 2012 http://openjdk.java.net/jeps/169

Slide 37

Slide 37 text

Extend generic types to support the specialization of generic classes and interfaces over primitive types. JEP 218: Generics over Primitive Types, 2014 http://openjdk.java.net/jeps/218

Slide 38

Slide 38 text

hg clone http://hg.openjdk.java.net/jdk8u/jdk8u cd jdk8u chmod +x get_source.sh chmod +x configure ./get_source.sh ./configure --with-boot-jdk=/usr/lib/jvm/java-7-oracle --with-debug-level=slowdebug make clean images ./build/linux-x86_64-normal-server-release/images/j2sdk -image/

Slide 39

Slide 39 text

hg clone http://hg.openjdk.java.net/valhalla/valhalla cd valhalla chmod +x get_source.sh chmod +x configure ./get_source.sh ./configure --with-boot-jdk=/usr/lib/jvm/java-8-oracle --with-debug-level=slowdebug make clean images ./build/linux-x86_64-normal-server-release/images/jdk/

Slide 40

Slide 40 text

Please don’t comment on syntax By no means final!

Slide 41

Slide 41 text

Value types "Codes like a class, works like an int" http://cr.openjdk.java.net/~jrose/values/values.html

Slide 42

Slide 42 text

public final __ByValue class Complex { private final double re, im; public Complex(double re, double im) { this.re = re; this.im = im; } public String toString() { return re + " + " + im + "i"; } public static void main(String[] args) { Complex c = __Make Complex(0, 0); System.out.println(c); } }

Slide 43

Slide 43 text

Also... Generic specialization http://cr.openjdk.java.net/~briangoetz/valhalla/specialization.html

Slide 44

Slide 44 text

public final class Box { private final T value; public Box(T value) { this.value = value; } public String toString() { __WhereVal(T) { return "val " + value.toString(); } __WhereRef(T) { return "ref " + value; } } public static void main(String[] args) { System.out.println(new Box(1)); System.out.println(new Box(1)); } }

Slide 45

Slide 45 text

import java.anyutil.List; import java.anyutil.ArrayList; public class Lists { public static void main(String[] args) { List l1 = new ArrayList(); List l2 = new ArrayList(); l1.add(1); l2.add(1); l1.add(2); l2.add(2); l1.add(3); l2.add(3); System.out.println(l1); System.out.println(l2); } }

Slide 46

Slide 46 text

Just a periodic reminder that performance of all the prototypes is going to be pathologically awful for quite a while; attempting to measure performance at this early stage [is] counterproductive. Brian Goetz, 2016-01-29 http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-January/001809.html

Slide 47

Slide 47 text

Slightly related: ObjectLayout.org

Slide 48

Slide 48 text

The org.ObjectLayout package provides a set of data structure classes designed with optimised memory layout in mind. These classes are aimed at matching the natural speed benefits similar constructs enable in most C-style languages [...] http://objectlayout.org/

Slide 49

Slide 49 text

Random random = new Random(); CtorAndArgsProvider p = context -> new CtorAndArgs<>( Complex.class, new Class[] {double.class, double.class}, random.nextInt(1000), random.nextInt(1000) ); StructuredArray complexNumbers = StructuredArray.newInstance( Complex.class, p, size);

Slide 50

Slide 50 text

Other resources https://www.youtube.com/watch?v=uNgAFSUXuwc http://blog.codefx.org/java/value-based-classes/ http://blog.codefx.org/java/dev/the-road-to-valhalla/ https://channel9.msdn.com/Events/Build/2014/2-661

Slide 51

Slide 51 text

Q’s? A’s http://www.devconf.cz/f/326 [email protected]