Slide 1

Slide 1 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 HotSpot Under the Hood Alex Blewitt @alblue

Slide 2

Slide 2 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 HotSpot Under the Hood Alex Blewitt @alblue

Slide 3

Slide 3 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Overview javac Source Byte code JVM 1 Interpreter Native JIT

Slide 4

Slide 4 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Overview javac Source Byte code JVM 1 int thing[] = new int[10]; 0 3 4 1 2

Slide 5

Slide 5 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 int[] thing • Arrays are variable† sized objects on the heap 32 2 3 4 5 6 7 8 9 10 1 4 40 length 4 klass 4 or 8 klass 4 or 8 64 64 mark pad ? Objects are multiples of 8‡ 16, 24, 32, 40, 48, 56, 64 … 1 Types may also have padding for data alignment mark klass mark ‡ when object alignment is 8 † all other objects are fixed size

Slide 6

Slide 6 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Java Object Layout • jol is an OpenJDK tool that can show • internals – how the object is laid out • footprint – how much memory it takes $ java -jar jol-cli.jar internals java.lang.String java.lang.String object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 4 4 (object header) 00 00 00 00 8 4 (object header) c2 02 00 f8 12 4 char[] String.value [] 16 4 int String.hash 0 20 4 (loss due to the next object alignment) Written by JVM superstar Aleksey Shipilëv @shipilev

Slide 7

Slide 7 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Java Object Layout • jol is an OpenJDK tool that can show • internals – how the object is laid out • footprint – how much memory it takes $ java -jar jol-cli.jar footprint java.lang.String java.lang.String@6842775dd footprint: COUNT AVG SUM DESCRIPTION 1 16 16 [C 1 24 24 java.lang.String 2 40 (total)

Slide 8

Slide 8 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Java Object Layout • Uses the default constructor for instance • For non-default constructors • Create a wrapper class • In default constructor, instantiate/store type • Run jol footprint on wrapper class

Slide 9

Slide 9 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Java Object Layout • Uses the default constructor for instance • For non-default constructors • Create a wrapper class • In default constructor, instantiate/store type • Run jol footprint on wrapper class public class Wrapper { private URI uri; public Wrapper() throws Exception { uri = new URI("https://go.java"); } } $ java -cp jol-cli.jar:. org.openjdk.jol.Main footprint Wrapper Wrapper@574caa3fd footprint: COUNT AVG SUM DESCRIPTION 1 16 16 Wrapper 6 33 200 [C 6 24 144 java.lang.String 1 80 80 java.net.URI 14 440 (total)

Slide 10

Slide 10 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 int[] thing • Arrays are variable† sized objects on the heap 32 2 3 4 5 6 7 8 9 10 1 4 40 length 4 klass 4 or 8 klass 4 or 8 64 64 mark pad ? Objects are multiples of 8‡ 16, 24, 32, 40, 48, 56, 64 … ‡ when object alignment is 8 1 Types may also have padding for data alignment mark klass mark † all other objects are fixed size

Slide 11

Slide 11 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Klass field • The klass field is a pointer to the object's type • Think getClass() in Java … • Present for every object/array instance • Can be 4 or 8 bytes wide • 32 bit JVM – 4 bytes • 64 bit JVM – 4 bytes or 8 bytes Klass field can be a compressed OOP

Slide 12

Slide 12 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Compressed OOPS • Compressed Ordinary Object Pointers • Store an object reference in 32 bits 64 0 0 2 F 0 0 0 0 0 0 2 F zero extend 0 0 2 F shift 3 extend 0 0 7 0 0 1 0 8 0 0 2 F shift 3 + base 0 1 7 0 0 1 0 8 < 4G < ~30G < 32G -XX:+/-UseCompressedOops -XX:+/-UseCompressedClassPointers -XX:ObjectAlignmentInBytes=8 0 1 F 0 0 2 0 0 =16 < 64G 1111 0010 0111 1000 0001 0000 23

Slide 13

Slide 13 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Compressed OOPS • Handled efficiently by generated code • In many cases, don't need to expand • Uses addressing modes to pack/unpack 64 r11 = *(r10 * 8 + r12 + 12) mov 0xc(%r12,%r10,8),%r11d Compressed OOP r12 is
 Heap base Field offset Address in memory ‡ when object alignment is 8

Slide 14

Slide 14 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Array length • Getting the length of an array 2 3 4 5 6 7 8 9 10 1 length klass mark 4 8 64 4 address +12 0xc compressed oop << shift length = *( r10 * 8‡ + 12 ) ‡ when object alignment is 8 length = *( address + 12 ) address = coop << 3‡ Base not used for small (<~30G) heaps or uncompressed oops

Slide 15

Slide 15 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Array length • Getting the length of an array 2 3 4 5 6 7 8 9 10 1 length klass mark 4 8 64 4 address +12 0xc compressed oop length = *( address + 12 ) address = coop << 3‡ + base ‡ when object alignment is 8 length = *( r10 * 8‡ + r12 + 12 ) << shift + base Base used for large (>~30G) heaps with compressed oops

Slide 16

Slide 16 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Overview javac Source Byte code JVM 1 int thing[] = new int[10]; int sum(int[] thing) { int total = 0; for(int t : thing) { total += t; } return total; } 0 3 4 1 2

Slide 17

Slide 17 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Bytecode • JavaC translates Java to bytecode • Stack-based byte oriented code • Local vars istore_1 • Object loads aload_2 • Array length arraylength int sum(int[] thing) { int total = 0; for(int t : thing) { total += t; } return total; }

Slide 18

Slide 18 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Bytecode • JavaC translates Java to bytecode • Stack-based byte oriented code • Local vars istore_1 • Object loads aload_2 • Array length arraylength 0: iconst_0 1: istore_1 2: aload_0 3: astore_2 4: aload_2 5: arraylength 6: istore_3 7: iconst_0 8: istore 4 10: iload 4 12: iload_3 13: if_icmpge 33 16: aload_2 17: iload 4 19: iaload 20: istore 5 22: iload_1 23: iload 5 25: iadd 26: istore_1 27: iinc 4,1 30: goto 10 33: iload_1 34: ireturn

Slide 19

Slide 19 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Compilation levels • HotSpot uses -XX:+TieredCompilation • 0 – interpreter • 1 – pure C1 • 2 – C1 with some profiling • 3 – C1 with full profiling • 4 – C2 (full optimisation) 0 3 4 1 2 Starts off interpreted '-server' JIT level '-client' JIT levels

Slide 20

Slide 20 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 (Re)compilation • Methods get recompiled frequently • Use -XX:+PrintCompilation to show them • % = on stack replacement, n = native, ! = exception handler 70 1 n 0 java.lang.System::arraycopy (native) (static) 71 2 3 java.lang.Object:: (1 bytes) 73 3 3 java.lang.String::hashCode (55 bytes) 75 5 3 java.lang.String::charAt (29 bytes) 76 6 3 java.lang.String::length (6 bytes) 76 7 3 java.lang.String::indexOf (70 bytes) 76 4 3 java.lang.Math::min (11 bytes) 76 9 1 java.lang.Object:: (1 bytes) 76 2 3 java.lang.Object:: (1 bytes) made not entrant Time since JVM start Compile ID Compile level Or use JITWatch by Chris Newland @chriswhcodes

Slide 21

Slide 21 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Interpreter • Writing an interpreter sounds simple … switch(bytecode) { case nop: break; case aconst_null: push(null); break; case iconst_m1: push(-1); break; case iconst_0: push(0); break; case iconst_1: push(1); break; … }

Slide 22

Slide 22 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • HotSpot uses‡ a template interpreter Runnable[] templateTable = new Runnable[] { () -> {}, // nop () -> push(null), // aconst_null () -> push(-1), // iconst_m1 () -> push(0), // iconst_0 () -> push(1), // iconst_1 … } templateTable[bytecodeIndex++].run() ‡ this is a Java approximation only

Slide 23

Slide 23 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 HotSpot Disassembler • HotSpot has an hsdis plugin to allow dissasembly of code • In jdk8u/jdk8u/hotspot repository in src/share/tools/hsdis/ • Four files to download (don't need to clone whole repository) • hsdis.c, hsdis.h, Makefile, README • Requires binutils 2.17+ from FSF.org • Cannot be redistributed ${COPYRIGHT} Look for warning about missing 'makeinfo' Cygwin needs ${MINGW}-ar for AR hsdis- amd64.dylib hsdis-i386.dylib https://alblue.bandlem.com/2016/09/javaone-hotspot.html

Slide 24

Slide 24 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Assembly‡ from -XX:+PrintInterpreter arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) ‡ requires hsdis to be built/installed

Slide 25

Slide 25 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Get address of array into 64-bit rax register arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)

Slide 26

Slide 26 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Load *(address + 12) into 32-bit eax arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)

Slide 27

Slide 27 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Load byte *(r13 + 1) into 32-bit ebx; r13++ arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) r13 is the bytecode index pointer

Slide 28

Slide 28 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Load byte *(r13 + 1) into 32-bit ebx; r13++ arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) r13 is the bytecode index pointer Logically equivalent to: inc %r13 ; %r13++ movzbl (%r13), %ebx but HotSpot's approach is faster since the naïve implementation would cause a data dependency on %r13 between the prior instruction and the subsequent one

Slide 29

Slide 29 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Load table address 0x10…60 into 64-bit r10 arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) 0x106293760 is the start of the template table

Slide 30

Slide 30 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Jump to r10 + rbx * 8 arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) rbx is the next bytecode loaded earlier

Slide 31

Slide 31 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Nop instruction (slightly bigger nop) arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) fills gap until next alignment

Slide 32

Slide 32 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Array length = *(address of object + 0xc) arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)

Slide 33

Slide 33 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Array length = *(address of object + 0xc) arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) This is the key part of the arraylength bytecode

Slide 34

Slide 34 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template Interpreter • Array length = *(address of object + 0xc) arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) 2 3 4 5 6 7 8 9 10 1 length klass mark address +12 0xc This is the key part of the arraylength bytecode

Slide 35

Slide 35 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Null Checks • Null checks are automatically handled • The assembly code is generated from: void TemplateTable::arraylength() { transition(atos, itos); __ null_check(rax, arrayOopDesc::length_offset_in_bytes()); __ movl(rax, Address(rax, arrayOopDesc::length_offset_in_bytes())); } 0x00000001068fe9a1: mov 0xc(%rax),%eax If rax is null, *(0+0xc) is a deref of a zero page memory location - causes SIGSEGV JVM SIGSEGV handler translates this to NullPointerException

Slide 36

Slide 36 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 In-line assembler • HotSpot code uses __ as a prefix for 'assemble this instruction' • #define __ _masm, which is MacroAssembler • Generates instructions for current architecture • Allows code to vary depending on runtime architecture • Take advantage of vector instructions if availbale • Optimised OOP decoding and null handling

Slide 37

Slide 37 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Top of Stack • It's a little more complicated than that … • HotSpot caches top-of-stack in a register • Faster access • Different register based on type • rax – long/int/short/char/byte/boolean • xmm0 – double/float • Different implementations needed for pop

Slide 38

Slide 38 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Popping off pop 87 pop 0x00000001068f5440: push %rax 0x00000001068f5441: jmpq 0x00000001068f5470 0x00000001068f5446: sub $0x8,%rsp 0x00000001068f544a: vmovss %xmm0,(%rsp) 0x00000001068f544f: jmpq 0x00000001068f5470 0x00000001068f5454: sub $0x10,%rsp 0x00000001068f5458: vmovsd %xmm0,(%rsp) 0x00000001068f545d: jmpq 0x00000001068f5470 0x00000001068f5462: sub $0x10,%rsp 0x00000001068f5466: mov %rax,(%rsp) 0x00000001068f546a: jmpq 0x00000001068f5470 0x00000001068f546f: push %rax 0x00000001068f5470: add $0x8,%rsp object float double long int Entry points for different types

Slide 39

Slide 39 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 • The type of value on the top affects entry point TemplateTable Top of Stack state Byte code Byte Bool Char Short Int Long Float Double Object Void array length X X X X X X X X fe9a0 fe9a0 pop f546f f546f f546f f546f f546f f546f f5446 f5454 f5440 f5440 iadd f5920 X f5920 f5920 f5920 X X X X f5920 ladd X X X X X f5980 X X X f5908 Entry Entry Entry Entry

Slide 40

Slide 40 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Wide and safepoint • Wide extends certain instructions • load i -> load ii, fstore i -> fstore ii • iinc i -> iinc ii • Different table when interpreting 'wide mode' • _template_table, _template_table_wide • Can be used to implement safepoint • Update entry points to use safepoint handler

Slide 41

Slide 41 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Fast bytecodes • Some bytecodes are re-written on the fly • getfield -> fast_agetfield, fast_igetfield etc. • putfield -> fast_aputfield, fast_iputfield etc. • iload -> fast_iload • aload_0 -> fast_aload_0 aload_0 stores this for instances

Slide 42

Slide 42 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Calling Native Methods • Code looks up method through vtable • klazz.vtable[id].method -> native code klass vtable Method Method Method nmethod • in use • not entrant • zombie • unloaded nmethod nmethod Interpreted 0 3 4 1 in use not entrant zombie unloaded (sub) klass vtable Method Method in use

Slide 43

Slide 43 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Speeding up • 90% of call sites have a single typed target • Don't need to go through lookup each time klass Method nmethod if (klass == ) { nmethod(); } else { klass.vtable[id].method.nmethod(); } entry point verified entry point fallback Monomophic dispatch optimisation; also aplies to bimorphic dispatch

Slide 44

Slide 44 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Verified Entry Point • Code has an entry point and verified entry point • Entry point is where code starts Code: [Entry Point] # {method} {0x000000011a54b000} 'hashCode' in 'java/lang/String' # [sp+0x40] (sp of caller) 0x00000001067dac80: mov 0x8(%rsi),%r10d 0x00000001067dac84: shl $0x3,%r10 0x00000001067dac88: cmp %rax,%r10 0x00000001067dac8b: jne 0x000000010671fb60 ; {runtime_call} 0x00000001067dac91: data32 data32 nopw 0x0(%rax,%rax,1) 0x00000001067dac9c: data32 data32 xchg %ax,%ax [Verified Entry Point] rsi is the String instance rsi+8 is klass shl 3 == *8 Expanding compressed klass oop rax is the expected type (String) Fall back (to interpreter) if not valid

Slide 45

Slide 45 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Verified Entry Point • Code has an entry point and verified entry point • Verified Entry point is where type holds Code: [Entry Point] … [Verified Entry Point] 0x00000001067daca0: mov %eax,-0x14000(%rsp) 0x00000001067daca7: push %rbp 0x00000001067daca8: sub $0x30,%rsp 0x00000001067dacac: movabs $0x11a70ccb0,%rax ; {metadata(method data for {method} ; {0x000000011a54b000} 'hashCode' '()I' in 'java/lang/String')} 0x00000001067dacb6: mov 0xdc(%rax),%edi 0x00000001067dacbc: add $0x8,%edi 0x00000001067dacbf: mov %edi,0xdc(%rax) Profiling data for String's hashCode Stack banging/ StackOverflowError Adding 1 to the number of times called Bottom 3 bits are used for flags

Slide 46

Slide 46 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Safepoint and return • Stack is reset to its original position • Safepoint test of protected page • Function returns … 0x00000001067f1997: callq 0x0000000110b27d60 ; {optimized virtual_call} 0x00000001067f199c: movslq %eax,%rax 0x00000001067f199f: add $0x30,%rsp 0x00000001067f19a3: pop %rbp 0x00000001067f19a4: test %eax,-0x20738aa(%rip) ; {poll_return} 0x00000001067f19aa: retq Safepoint poll; when JVM wants to do e.g. GC it marks this page as non-readable which causes a fault that is handled by the JVM Static functions can be called directly

Slide 47

Slide 47 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Intrinsics • Thread.currentThread() • System.arraycopy() • Object.clone() • System.nanoTime(), currentTimeMillis() • String.indexOf() • Math.* r15 is used for the thread instance Multiple implementations based on generic type Call out to OS provided native libraries

Slide 48

Slide 48 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 [Entry Point] [Verified Entry Point] # {method} {0x0000000126ea2468} 'getId' '()J' 0x00010e24de40: sub $0x18,%rsp 0x00010e24de47: mov %rbp,0x10(%rsp) ; - Test::getId@-1 (line 10) 0x00010e24de4c: mov 0x1b8(%r15),%r10 ;*invokestatic currentThread ; - Test::getId@0 (line 10) 0x00010e24de53: mov 0x28(%r10),%rax ;*getfield tid ; - java.lang.Thread::getId@1 (line 1702) ; - Test::getId@3 (line 10) 0x00010e24de57: add $0x10,%rsp 0x00010e24de5b: pop %rbp 0x00010e24de5c: test %eax,-0x2051e62(%rip) 0x00010e24de62: retq Thread.currentThread() • No method invocation needed public class Test { private URI uri; public static void main() { long t = 0 for(int i = 0; i < 20_000; i++) { t += getId(); } System.out.println("Total: " + t); } public static long getId() { return Thread.currentThread().getId(); } } Current thread is stored in r15 Field at position 0x28 is tid 0x28 == 40

Slide 49

Slide 49 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 [Entry Point] [Verified Entry Point] # {method} {0x0000000126ea2468} 'getId' '()J' 0x00010e24de40: sub $0x18,%rsp 0x00010e24de47: mov %rbp,0x10(%rsp) ; - Test::getId@-1 (line 10) 0x00010e24de4c: mov 0x1b8(%r15),%r10 ;*invokestatic currentThread ; - Test::getId@0 (line 10) 0x00010e24de53: mov 0x28(%r10),%rax ;*getfield tid ; - java.lang.Thread::getId@1 (line 1702) ; - Test::getId@3 (line 10) 0x00010e24de57: add $0x10,%rsp 0x00010e24de5b: pop %rbp 0x00010e24de5c: test %eax,-0x2051e62(%rip) 0x00010e24de62: retq Thread.currentThread() • No method invocation needed public class Test { private URI uri; public static void main() { long t = 0 for(int i = 0; i < 20_000; i++) { t += getId(); } System.out.println("Total: " + t); } public static long getId() { return Thread.currentThread().getId(); } } Current thread is stored in r15 Field at position 0x28 (40) is tid java.lang.Thread object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 4 4 (object header) 00 00 00 00 8 4 (object header) d6 0c 00 f8 12 4 int Thread.priority 5 16 8 long Thread.eetop 0 24 8 long Thread.stackSize 0 32 8 long Thread.nativeParkEventPointer 0 40 8 long Thread.tid 13 48 4 int Thread.threadStatus 0 52 1 boolean Thread.single_step false 53 1 boolean Thread.daemon false 54 1 boolean Thread.stillborn false 55 1 (alignment/padding gap) N/A 56 4 String Thread.name (object) .. . ... ........... 244 4 int Thread.threadLocalRandomSecondarySeed 0 248 128 (loss due to the next object alignment) Instance size: 376 bytes Space losses: 129 bytes internal + 128 bytes external = 257 bytes total 0x28 == 40

Slide 50

Slide 50 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 [Entry Point] [Verified Entry Point] # {method} {0x0000000126ea2468} 'getId' '()J' 0x00010e24de40: sub $0x18,%rsp 0x00010e24de47: mov %rbp,0x10(%rsp) ; - Test::getId@-1 (line 10) 0x00010e24de4c: mov 0x1b8(%r15),%r10 ;*invokestatic currentThread ; - Test::getId@0 (line 10) 0x00010e24de53: mov 0x28(%r10),%rax ;*getfield tid ; - java.lang.Thread::getId@1 (line 1702) ; - Test::getId@3 (line 10) 0x00010e24de57: add $0x10,%rsp 0x00010e24de5b: pop %rbp 0x00010e24de5c: test %eax,-0x2051e62(%rip) 0x00010e24de62: retq Thread.currentThread() • No method invocation needed public class Test { private URI uri; public static void main() { long t = 0 for(int i = 0; i < 20_000; i++) { t += getId(); } System.out.println("Total: " + t); } public static long getId() { return Thread.currentThread().getId(); } } Current thread is stored in r15 Field at position 0x28 (40) is tid java.lang.Thread object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 4 4 (object header) 00 00 00 00 8 4 (object header) d6 0c 00 f8 12 4 int Thread.priority 5 16 8 long Thread.eetop 0 24 8 long Thread.stackSize 0 32 8 long Thread.nativeParkEventPointer 0 40 8 long Thread.tid 13 48 4 int Thread.threadStatus 0 52 1 boolean Thread.single_step false 53 1 boolean Thread.daemon false 54 1 boolean Thread.stillborn false 55 1 (alignment/padding gap) N/A 56 4 String Thread.name (object) .. . ... ........... 244 4 int Thread.threadLocalRandomSecondarySeed 0 248 128 (loss due to the next object alignment) Instance size: 376 bytes Space losses: 129 bytes internal + 128 bytes external = 257 bytes total 0x28 == 40 // The following three initially uninitialized fields are exclusively // managed by class java.util.concurrent.ThreadLocalRandom. These // fields are used to build the high-performance PRNGs in the // concurrent code, and we can not risk accidental false sharing. // Hence, the fields are isolated with @Contended. /** The current seed for a ThreadLocalRandom */ @jdk.internal.vm.annotation.Contended("tlr") long threadLocalRandomSeed; /** Probe hash value; nonzero if threadLocalRandomSeed initialized */ @jdk.internal.vm.annotation.Contended("tlr") int threadLocalRandomProbe; /** Secondary seed isolated from public ThreadLocalRandom sequence */ @jdk.internal.vm.annotation.Contended("tlr") int threadLocalRandomSecondarySeed; Was sun.misc.Contended

Slide 51

Slide 51 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Optimisations • HotSpot can optimise code by • Method inlining • Dead code/path elimination • Heuristics for optimising call sites • Constant folding • C2 performs additional optimisations (escape analysis etc)

Slide 52

Slide 52 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Summary • Objects on heap have a mark word and a klass pointer • Compressed OOPS result in smaller memory for <32 Gb heaps • Objects may have internal or external padding waste due to alignment • Execution begins with template interpreter, level 0 • JIT compilation triggered with levels 1-4; native methods are retired • Monomorphic and bimorphic methods have optimised call sites • Intrinsic methods don't pay method call costs

Slide 53

Slide 53 text

Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 HotSpot Under the Hood Alex Blewitt @alblue https://speakerdeck.com/alblue/ https://alblue.bandlem.com/2016/09/javaone-hotspot.html