Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

JavaOne 2016: HotSpot Under the Hood

alblue
September 19, 2016

JavaOne 2016: HotSpot Under the Hood

Have you ever wondered how the JVM works under the covers? How the JVM is able to JIT-optimize the bytecode classes and what the generated output looks like? This session shows how a compiled Java class is loaded in memory, when the JIT optimizations occur, and what the generated assembly looks like for hot code in the JVM. The presentation also looks at current object layouts, how the memory settings affect how objects are stored, and what effects this can have for high-performance Java code. Presented at JavaOne 2016.

More information about the presentation, including how to build hsdis, can be found at https://alblue.bandlem.com/2016/09/javaone-hotspot.html

alblue

September 19, 2016
Tweet

More Decks by alblue

Other Decks in Technology

Transcript

  1. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Overview

    javac Source Byte code JVM 1 Interpreter Native JIT
  2. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Overview

    javac Source Byte code JVM 1 int thing[] = new int[10]; 0 3 4 1 2
  3. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 int[]

    thing • Arrays are variable† sized objects on the heap 32 2 3 4 5 6 7 8 9 10 1 4 40 length 4 klass 4 or 8 klass 4 or 8 64 64 mark pad ? Objects are multiples of 8‡ 16, 24, 32, 40, 48, 56, 64 … 1 Types may also have padding for data alignment mark klass mark ‡ when object alignment is 8 † all other objects are fixed size
  4. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Java

    Object Layout • jol is an OpenJDK tool that can show • internals – how the object is laid out • footprint – how much memory it takes $ java -jar jol-cli.jar internals java.lang.String java.lang.String object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 4 4 (object header) 00 00 00 00 8 4 (object header) c2 02 00 f8 12 4 char[] String.value [] 16 4 int String.hash 0 20 4 (loss due to the next object alignment) Written by JVM superstar Aleksey Shipilëv @shipilev
  5. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Java

    Object Layout • jol is an OpenJDK tool that can show • internals – how the object is laid out • footprint – how much memory it takes $ java -jar jol-cli.jar footprint java.lang.String java.lang.String@6842775dd footprint: COUNT AVG SUM DESCRIPTION 1 16 16 [C 1 24 24 java.lang.String 2 40 (total)
  6. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Java

    Object Layout • Uses the default constructor for instance • For non-default constructors • Create a wrapper class • In default constructor, instantiate/store type • Run jol footprint on wrapper class
  7. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Java

    Object Layout • Uses the default constructor for instance • For non-default constructors • Create a wrapper class • In default constructor, instantiate/store type • Run jol footprint on wrapper class public class Wrapper { private URI uri; public Wrapper() throws Exception { uri = new URI("https://go.java"); } } $ java -cp jol-cli.jar:. org.openjdk.jol.Main footprint Wrapper Wrapper@574caa3fd footprint: COUNT AVG SUM DESCRIPTION 1 16 16 Wrapper 6 33 200 [C 6 24 144 java.lang.String 1 80 80 java.net.URI 14 440 (total)
  8. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 int[]

    thing • Arrays are variable† sized objects on the heap 32 2 3 4 5 6 7 8 9 10 1 4 40 length 4 klass 4 or 8 klass 4 or 8 64 64 mark pad ? Objects are multiples of 8‡ 16, 24, 32, 40, 48, 56, 64 … ‡ when object alignment is 8 1 Types may also have padding for data alignment mark klass mark † all other objects are fixed size
  9. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Klass

    field • The klass field is a pointer to the object's type • Think getClass() in Java … • Present for every object/array instance • Can be 4 or 8 bytes wide • 32 bit JVM – 4 bytes • 64 bit JVM – 4 bytes or 8 bytes Klass field can be a compressed OOP
  10. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Compressed

    OOPS • Compressed Ordinary Object Pointers • Store an object reference in 32 bits 64 0 0 2 F 0 0 0 0 0 0 2 F zero extend 0 0 2 F shift 3 extend 0 0 7 0 0 1 0 8 0 0 2 F shift 3 + base 0 1 7 0 0 1 0 8 < 4G < ~30G < 32G -XX:+/-UseCompressedOops -XX:+/-UseCompressedClassPointers -XX:ObjectAlignmentInBytes=8 0 1 F 0 0 2 0 0 =16 < 64G 1111 0010 0111 1000 0001 0000 23
  11. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Compressed

    OOPS • Handled efficiently by generated code • In many cases, don't need to expand • Uses addressing modes to pack/unpack 64 r11 = *(r10 * 8 + r12 + 12) mov 0xc(%r12,%r10,8),%r11d Compressed OOP r12 is
 Heap base Field offset Address in memory ‡ when object alignment is 8
  12. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Array

    length • Getting the length of an array 2 3 4 5 6 7 8 9 10 1 length klass mark 4 8 64 4 address +12 0xc compressed oop << shift length = *( r10 * 8‡ + 12 ) ‡ when object alignment is 8 length = *( address + 12 ) address = coop << 3‡ Base not used for small (<~30G) heaps or uncompressed oops
  13. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Array

    length • Getting the length of an array 2 3 4 5 6 7 8 9 10 1 length klass mark 4 8 64 4 address +12 0xc compressed oop length = *( address + 12 ) address = coop << 3‡ + base ‡ when object alignment is 8 length = *( r10 * 8‡ + r12 + 12 ) << shift + base Base used for large (>~30G) heaps with compressed oops
  14. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Overview

    javac Source Byte code JVM 1 int thing[] = new int[10]; int sum(int[] thing) { int total = 0; for(int t : thing) { total += t; } return total; } 0 3 4 1 2
  15. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Bytecode

    • JavaC translates Java to bytecode • Stack-based byte oriented code • Local vars istore_1 • Object loads aload_2 • Array length arraylength int sum(int[] thing) { int total = 0; for(int t : thing) { total += t; } return total; }
  16. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Bytecode

    • JavaC translates Java to bytecode • Stack-based byte oriented code • Local vars istore_1 • Object loads aload_2 • Array length arraylength 0: iconst_0 1: istore_1 2: aload_0 3: astore_2 4: aload_2 5: arraylength 6: istore_3 7: iconst_0 8: istore 4 10: iload 4 12: iload_3 13: if_icmpge 33 16: aload_2 17: iload 4 19: iaload 20: istore 5 22: iload_1 23: iload 5 25: iadd 26: istore_1 27: iinc 4,1 30: goto 10 33: iload_1 34: ireturn
  17. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Compilation

    levels • HotSpot uses -XX:+TieredCompilation • 0 – interpreter • 1 – pure C1 • 2 – C1 with some profiling • 3 – C1 with full profiling • 4 – C2 (full optimisation) 0 3 4 1 2 Starts off interpreted '-server' JIT level '-client' JIT levels
  18. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 (Re)compilation

    • Methods get recompiled frequently • Use -XX:+PrintCompilation to show them • % = on stack replacement, n = native, ! = exception handler 70 1 n 0 java.lang.System::arraycopy (native) (static) 71 2 3 java.lang.Object::<init> (1 bytes) 73 3 3 java.lang.String::hashCode (55 bytes) 75 5 3 java.lang.String::charAt (29 bytes) 76 6 3 java.lang.String::length (6 bytes) 76 7 3 java.lang.String::indexOf (70 bytes) 76 4 3 java.lang.Math::min (11 bytes) 76 9 1 java.lang.Object::<init> (1 bytes) 76 2 3 java.lang.Object::<init> (1 bytes) made not entrant Time since JVM start Compile ID Compile level Or use JITWatch by Chris Newland @chriswhcodes
  19. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Interpreter

    • Writing an interpreter sounds simple … switch(bytecode) { case nop: break; case aconst_null: push(null); break; case iconst_m1: push(-1); break; case iconst_0: push(0); break; case iconst_1: push(1); break; … }
  20. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • HotSpot uses‡ a template interpreter Runnable[] templateTable = new Runnable[] { () -> {}, // nop () -> push(null), // aconst_null () -> push(-1), // iconst_m1 () -> push(0), // iconst_0 () -> push(1), // iconst_1 … } templateTable[bytecodeIndex++].run() ‡ this is a Java approximation only
  21. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 HotSpot

    Disassembler • HotSpot has an hsdis plugin to allow dissasembly of code • In jdk8u/jdk8u/hotspot repository in src/share/tools/hsdis/ • Four files to download (don't need to clone whole repository) • hsdis.c, hsdis.h, Makefile, README • Requires binutils 2.17+ from FSF.org • Cannot be redistributed ${COPYRIGHT} Look for warning about missing 'makeinfo' Cygwin needs ${MINGW}-ar for AR hsdis- amd64.dylib hsdis-i386.dylib https://alblue.bandlem.com/2016/09/javaone-hotspot.html
  22. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Assembly‡ from -XX:+PrintInterpreter arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) ‡ requires hsdis to be built/installed
  23. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Get address of array into 64-bit rax register arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
  24. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Load *(address + 12) into 32-bit eax arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
  25. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Load byte *(r13 + 1) into 32-bit ebx; r13++ arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) r13 is the bytecode index pointer
  26. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Load byte *(r13 + 1) into 32-bit ebx; r13++ arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) r13 is the bytecode index pointer Logically equivalent to: inc %r13 ; %r13++ movzbl (%r13), %ebx but HotSpot's approach is faster since the naïve implementation would cause a data dependency on %r13 between the prior instruction and the subsequent one
  27. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Load table address 0x10…60 into 64-bit r10 arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) 0x106293760 is the start of the template table
  28. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Jump to r10 + rbx * 8 arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) rbx is the next bytecode loaded earlier
  29. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Nop instruction (slightly bigger nop) arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) fills gap until next alignment
  30. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Array length = *(address of object + 0xc) arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
  31. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Array length = *(address of object + 0xc) arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) This is the key part of the arraylength bytecode
  32. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Template

    Interpreter • Array length = *(address of object + 0xc) arraylength 190 0x00000001068fe9a0: pop %rax 0x00000001068fe9a1: mov 0xc(%rax),%eax 0x00000001068fe9a4: movzbl 0x1(%r13),%ebx 0x00000001068fe9a9: inc %r13 0x00000001068fe9ac: movabs $0x106293760,%r10 0x00000001068fe9b6: jmpq *(%r10,%rbx,8) 0x00000001068fe9ba: nopw 0x0(%rax,%rax,1) 2 3 4 5 6 7 8 9 10 1 length klass mark address +12 0xc This is the key part of the arraylength bytecode
  33. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Null

    Checks • Null checks are automatically handled • The assembly code is generated from: void TemplateTable::arraylength() { transition(atos, itos); __ null_check(rax, arrayOopDesc::length_offset_in_bytes()); __ movl(rax, Address(rax, arrayOopDesc::length_offset_in_bytes())); } 0x00000001068fe9a1: mov 0xc(%rax),%eax If rax is null, *(0+0xc) is a deref of a zero page memory location - causes SIGSEGV JVM SIGSEGV handler translates this to NullPointerException
  34. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 In-line

    assembler • HotSpot code uses __ as a prefix for 'assemble this instruction' • #define __ _masm, which is MacroAssembler • Generates instructions for current architecture • Allows code to vary depending on runtime architecture • Take advantage of vector instructions if availbale • Optimised OOP decoding and null handling
  35. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Top

    of Stack • It's a little more complicated than that … • HotSpot caches top-of-stack in a register • Faster access • Different register based on type • rax – long/int/short/char/byte/boolean • xmm0 – double/float • Different implementations needed for pop
  36. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Popping

    off pop 87 pop 0x00000001068f5440: push %rax 0x00000001068f5441: jmpq 0x00000001068f5470 0x00000001068f5446: sub $0x8,%rsp 0x00000001068f544a: vmovss %xmm0,(%rsp) 0x00000001068f544f: jmpq 0x00000001068f5470 0x00000001068f5454: sub $0x10,%rsp 0x00000001068f5458: vmovsd %xmm0,(%rsp) 0x00000001068f545d: jmpq 0x00000001068f5470 0x00000001068f5462: sub $0x10,%rsp 0x00000001068f5466: mov %rax,(%rsp) 0x00000001068f546a: jmpq 0x00000001068f5470 0x00000001068f546f: push %rax 0x00000001068f5470: add $0x8,%rsp object float double long int Entry points for different types
  37. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 •

    The type of value on the top affects entry point TemplateTable Top of Stack state Byte code Byte Bool Char Short Int Long Float Double Object Void array length X X X X X X X X fe9a0 fe9a0 pop f546f f546f f546f f546f f546f f546f f5446 f5454 f5440 f5440 iadd f5920 X f5920 f5920 f5920 X X X X f5920 ladd X X X X X f5980 X X X f5908 Entry Entry Entry Entry
  38. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Wide

    and safepoint • Wide extends certain instructions • load i -> load ii, fstore i -> fstore ii • iinc i -> iinc ii • Different table when interpreting 'wide mode' • _template_table, _template_table_wide • Can be used to implement safepoint • Update entry points to use safepoint handler
  39. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Fast

    bytecodes • Some bytecodes are re-written on the fly • getfield -> fast_agetfield, fast_igetfield etc. • putfield -> fast_aputfield, fast_iputfield etc. • iload -> fast_iload • aload_0 -> fast_aload_0 aload_0 stores this for instances
  40. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Calling

    Native Methods • Code looks up method through vtable • klazz.vtable[id].method -> native code klass vtable Method Method Method nmethod • in use • not entrant • zombie • unloaded nmethod nmethod Interpreted 0 3 4 1 in use not entrant zombie unloaded (sub) klass vtable Method Method in use
  41. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Speeding

    up • 90% of call sites have a single typed target • Don't need to go through lookup each time klass Method nmethod if (klass == ) { nmethod(); } else { klass.vtable[id].method.nmethod(); } entry point verified entry point fallback Monomophic dispatch optimisation; also aplies to bimorphic dispatch
  42. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Verified

    Entry Point • Code has an entry point and verified entry point • Entry point is where code starts Code: [Entry Point] # {method} {0x000000011a54b000} 'hashCode' in 'java/lang/String' # [sp+0x40] (sp of caller) 0x00000001067dac80: mov 0x8(%rsi),%r10d 0x00000001067dac84: shl $0x3,%r10 0x00000001067dac88: cmp %rax,%r10 0x00000001067dac8b: jne 0x000000010671fb60 ; {runtime_call} 0x00000001067dac91: data32 data32 nopw 0x0(%rax,%rax,1) 0x00000001067dac9c: data32 data32 xchg %ax,%ax [Verified Entry Point] rsi is the String instance rsi+8 is klass shl 3 == *8 Expanding compressed klass oop rax is the expected type (String) Fall back (to interpreter) if not valid
  43. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Verified

    Entry Point • Code has an entry point and verified entry point • Verified Entry point is where type holds Code: [Entry Point] … [Verified Entry Point] 0x00000001067daca0: mov %eax,-0x14000(%rsp) 0x00000001067daca7: push %rbp 0x00000001067daca8: sub $0x30,%rsp 0x00000001067dacac: movabs $0x11a70ccb0,%rax ; {metadata(method data for {method} ; {0x000000011a54b000} 'hashCode' '()I' in 'java/lang/String')} 0x00000001067dacb6: mov 0xdc(%rax),%edi 0x00000001067dacbc: add $0x8,%edi 0x00000001067dacbf: mov %edi,0xdc(%rax) Profiling data for String's hashCode Stack banging/ StackOverflowError Adding 1 to the number of times called Bottom 3 bits are used for flags
  44. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Safepoint

    and return • Stack is reset to its original position • Safepoint test of protected page • Function returns … 0x00000001067f1997: callq 0x0000000110b27d60 ; {optimized virtual_call} 0x00000001067f199c: movslq %eax,%rax 0x00000001067f199f: add $0x30,%rsp 0x00000001067f19a3: pop %rbp 0x00000001067f19a4: test %eax,-0x20738aa(%rip) ; {poll_return} 0x00000001067f19aa: retq Safepoint poll; when JVM wants to do e.g. GC it marks this page as non-readable which causes a fault that is handled by the JVM Static functions can be called directly
  45. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Intrinsics

    • Thread.currentThread() • System.arraycopy() • Object.clone() • System.nanoTime(), currentTimeMillis() • String.indexOf() • Math.* r15 is used for the thread instance Multiple implementations based on generic type Call out to OS provided native libraries
  46. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 [Entry

    Point] [Verified Entry Point] # {method} {0x0000000126ea2468} 'getId' '()J' 0x00010e24de40: sub $0x18,%rsp 0x00010e24de47: mov %rbp,0x10(%rsp) ; - Test::getId@-1 (line 10) 0x00010e24de4c: mov 0x1b8(%r15),%r10 ;*invokestatic currentThread ; - Test::getId@0 (line 10) 0x00010e24de53: mov 0x28(%r10),%rax ;*getfield tid ; - java.lang.Thread::getId@1 (line 1702) ; - Test::getId@3 (line 10) 0x00010e24de57: add $0x10,%rsp 0x00010e24de5b: pop %rbp 0x00010e24de5c: test %eax,-0x2051e62(%rip) 0x00010e24de62: retq Thread.currentThread() • No method invocation needed public class Test { private URI uri; public static void main() { long t = 0 for(int i = 0; i < 20_000; i++) { t += getId(); } System.out.println("Total: " + t); } public static long getId() { return Thread.currentThread().getId(); } } Current thread is stored in r15 Field at position 0x28 is tid 0x28 == 40
  47. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 [Entry

    Point] [Verified Entry Point] # {method} {0x0000000126ea2468} 'getId' '()J' 0x00010e24de40: sub $0x18,%rsp 0x00010e24de47: mov %rbp,0x10(%rsp) ; - Test::getId@-1 (line 10) 0x00010e24de4c: mov 0x1b8(%r15),%r10 ;*invokestatic currentThread ; - Test::getId@0 (line 10) 0x00010e24de53: mov 0x28(%r10),%rax ;*getfield tid ; - java.lang.Thread::getId@1 (line 1702) ; - Test::getId@3 (line 10) 0x00010e24de57: add $0x10,%rsp 0x00010e24de5b: pop %rbp 0x00010e24de5c: test %eax,-0x2051e62(%rip) 0x00010e24de62: retq Thread.currentThread() • No method invocation needed public class Test { private URI uri; public static void main() { long t = 0 for(int i = 0; i < 20_000; i++) { t += getId(); } System.out.println("Total: " + t); } public static long getId() { return Thread.currentThread().getId(); } } Current thread is stored in r15 Field at position 0x28 (40) is tid java.lang.Thread object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 4 4 (object header) 00 00 00 00 8 4 (object header) d6 0c 00 f8 12 4 int Thread.priority 5 16 8 long Thread.eetop 0 24 8 long Thread.stackSize 0 32 8 long Thread.nativeParkEventPointer 0 40 8 long Thread.tid 13 48 4 int Thread.threadStatus 0 52 1 boolean Thread.single_step false 53 1 boolean Thread.daemon false 54 1 boolean Thread.stillborn false 55 1 (alignment/padding gap) N/A 56 4 String Thread.name (object) .. . ... ........... 244 4 int Thread.threadLocalRandomSecondarySeed 0 248 128 (loss due to the next object alignment) Instance size: 376 bytes Space losses: 129 bytes internal + 128 bytes external = 257 bytes total 0x28 == 40
  48. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 [Entry

    Point] [Verified Entry Point] # {method} {0x0000000126ea2468} 'getId' '()J' 0x00010e24de40: sub $0x18,%rsp 0x00010e24de47: mov %rbp,0x10(%rsp) ; - Test::getId@-1 (line 10) 0x00010e24de4c: mov 0x1b8(%r15),%r10 ;*invokestatic currentThread ; - Test::getId@0 (line 10) 0x00010e24de53: mov 0x28(%r10),%rax ;*getfield tid ; - java.lang.Thread::getId@1 (line 1702) ; - Test::getId@3 (line 10) 0x00010e24de57: add $0x10,%rsp 0x00010e24de5b: pop %rbp 0x00010e24de5c: test %eax,-0x2051e62(%rip) 0x00010e24de62: retq Thread.currentThread() • No method invocation needed public class Test { private URI uri; public static void main() { long t = 0 for(int i = 0; i < 20_000; i++) { t += getId(); } System.out.println("Total: " + t); } public static long getId() { return Thread.currentThread().getId(); } } Current thread is stored in r15 Field at position 0x28 (40) is tid java.lang.Thread object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 4 4 (object header) 00 00 00 00 8 4 (object header) d6 0c 00 f8 12 4 int Thread.priority 5 16 8 long Thread.eetop 0 24 8 long Thread.stackSize 0 32 8 long Thread.nativeParkEventPointer 0 40 8 long Thread.tid 13 48 4 int Thread.threadStatus 0 52 1 boolean Thread.single_step false 53 1 boolean Thread.daemon false 54 1 boolean Thread.stillborn false 55 1 (alignment/padding gap) N/A 56 4 String Thread.name (object) .. . ... ........... 244 4 int Thread.threadLocalRandomSecondarySeed 0 248 128 (loss due to the next object alignment) Instance size: 376 bytes Space losses: 129 bytes internal + 128 bytes external = 257 bytes total 0x28 == 40 // The following three initially uninitialized fields are exclusively // managed by class java.util.concurrent.ThreadLocalRandom. These // fields are used to build the high-performance PRNGs in the // concurrent code, and we can not risk accidental false sharing. // Hence, the fields are isolated with @Contended. /** The current seed for a ThreadLocalRandom */ @jdk.internal.vm.annotation.Contended("tlr") long threadLocalRandomSeed; /** Probe hash value; nonzero if threadLocalRandomSeed initialized */ @jdk.internal.vm.annotation.Contended("tlr") int threadLocalRandomProbe; /** Secondary seed isolated from public ThreadLocalRandom sequence */ @jdk.internal.vm.annotation.Contended("tlr") int threadLocalRandomSecondarySeed; Was sun.misc.Contended
  49. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Optimisations

    • HotSpot can optimise code by • Method inlining • Dead code/path elimination • Heuristics for optimising call sites • Constant folding • C2 performs additional optimisations (escape analysis etc)
  50. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 Summary

    • Objects on heap have a mark word and a klass pointer • Compressed OOPS result in smaller memory for <32 Gb heaps • Objects may have internal or external padding waste due to alignment • Execution begins with template interpreter, level 0 • JIT compilation triggered with levels 1-4; native methods are retired • Monomorphic and bimorphic methods have optimised call sites • Intrinsic methods don't pay method call costs
  51. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd JavaOne 2016 HotSpot

    Under the Hood Alex Blewitt @alblue https://speakerdeck.com/alblue/ https://alblue.bandlem.com/2016/09/javaone-hotspot.html