Upgrade to Pro — share decks privately, control downloads, hide ads and more …

JavaOne 2016: HotSpot Under the Hood

alblue
September 19, 2016

JavaOne 2016: HotSpot Under the Hood

Have you ever wondered how the JVM works under the covers? How the JVM is able to JIT-optimize the bytecode classes and what the generated output looks like? This session shows how a compiled Java class is loaded in memory, when the JIT optimizations occur, and what the generated assembly looks like for hot code in the JVM. The presentation also looks at current object layouts, how the memory settings affect how objects are stored, and what effects this can have for high-performance Java code. Presented at JavaOne 2016.

More information about the presentation, including how to build hsdis, can be found at https://alblue.bandlem.com/2016/09/javaone-hotspot.html

alblue

September 19, 2016
Tweet

More Decks by alblue

Other Decks in Technology

Transcript

  1. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    HotSpot
    Under the Hood
    Alex Blewitt
    @alblue

    View Slide

  2. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    HotSpot
    Under the Hood
    Alex Blewitt
    @alblue

    View Slide

  3. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Overview
    javac
    Source
    Byte
    code
    JVM
    1
    Interpreter
    Native
    JIT

    View Slide

  4. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Overview
    javac
    Source
    Byte
    code
    JVM
    1
    int thing[] = new int[10];
    0
    3 4
    1
    2

    View Slide

  5. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    int[] thing
    • Arrays are variable† sized objects on the heap
    32
    2 3 4 5 6 7 8 9 10
    1
    4
    40
    length
    4
    klass
    4 or 8
    klass
    4 or 8
    64
    64
    mark
    pad
    ?
    Objects are multiples of 8‡
    16, 24, 32, 40, 48, 56, 64 …
    1
    Types may also
    have padding for
    data alignment
    mark
    klass
    mark
    ‡ when object alignment is 8
    † all other objects are fixed size

    View Slide

  6. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Java Object Layout
    • jol is an OpenJDK tool that can show
    • internals – how the object is laid out
    • footprint – how much memory it takes
    $ java -jar jol-cli.jar internals java.lang.String
    java.lang.String object internals:
    OFFSET SIZE TYPE DESCRIPTION VALUE
    0 4 (object header) 01 00 00 00
    4 4 (object header) 00 00 00 00
    8 4 (object header) c2 02 00 f8
    12 4 char[] String.value []
    16 4 int String.hash 0
    20 4 (loss due to the next object alignment)
    Written by JVM superstar
    Aleksey Shipilëv @shipilev

    View Slide

  7. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Java Object Layout
    • jol is an OpenJDK tool that can show
    • internals – how the object is laid out
    • footprint – how much memory it takes
    $ java -jar jol-cli.jar footprint java.lang.String
    java.lang.String@6842775dd footprint:
    COUNT AVG SUM DESCRIPTION
    1 16 16 [C
    1 24 24 java.lang.String
    2 40 (total)

    View Slide

  8. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Java Object Layout
    • Uses the default constructor for instance
    • For non-default constructors
    • Create a wrapper class
    • In default constructor, instantiate/store type
    • Run jol footprint on wrapper class

    View Slide

  9. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Java Object Layout
    • Uses the default constructor for instance
    • For non-default constructors
    • Create a wrapper class
    • In default constructor, instantiate/store type
    • Run jol footprint on wrapper class
    public class Wrapper {
    private URI uri;
    public Wrapper() throws Exception {
    uri = new URI("https://go.java");
    }
    }
    $ java -cp jol-cli.jar:. org.openjdk.jol.Main footprint Wrapper
    Wrapper@574caa3fd footprint:
    COUNT AVG SUM DESCRIPTION
    1 16 16 Wrapper
    6 33 200 [C
    6 24 144 java.lang.String
    1 80 80 java.net.URI
    14 440 (total)

    View Slide

  10. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    int[] thing
    • Arrays are variable† sized objects on the heap
    32
    2 3 4 5 6 7 8 9 10
    1
    4
    40
    length
    4
    klass
    4 or 8
    klass
    4 or 8
    64
    64
    mark
    pad
    ?
    Objects are multiples of 8‡
    16, 24, 32, 40, 48, 56, 64 …
    ‡ when object alignment is 8
    1
    Types may also
    have padding for
    data alignment
    mark
    klass
    mark
    † all other objects are fixed size

    View Slide

  11. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Klass field
    • The klass field is a pointer to the object's type
    • Think getClass() in Java …
    • Present for every object/array instance
    • Can be 4 or 8 bytes wide
    • 32 bit JVM – 4 bytes
    • 64 bit JVM – 4 bytes or 8 bytes
    Klass field can be a
    compressed OOP

    View Slide

  12. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Compressed OOPS
    • Compressed Ordinary Object Pointers
    • Store an object reference in 32 bits
    64
    0 0 2
    F 0 0 0
    0 0 0 2
    F
    zero extend
    0 0 2
    F shift 3 extend 0 0 7
    0 0 1 0
    8
    0 0 2
    F shift 3 + base 0 1 7
    0 0 1 0
    8
    < 4G
    < ~30G
    < 32G
    -XX:+/-UseCompressedOops
    -XX:+/-UseCompressedClassPointers
    -XX:ObjectAlignmentInBytes=8
    0 1 F
    0 0 2 0
    0
    =16 < 64G
    1111 0010
    0111 1000 0001 0000
    23

    View Slide

  13. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Compressed OOPS
    • Handled efficiently by generated code
    • In many cases, don't need to expand
    • Uses addressing modes to pack/unpack
    64
    r11 = *(r10 * 8 + r12 + 12)
    mov 0xc(%r12,%r10,8),%r11d
    Compressed
    OOP
    r12 is

    Heap base
    Field
    offset
    Address in
    memory
    ‡ when object alignment is 8

    View Slide

  14. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Array length
    • Getting the length of an array
    2 3 4 5 6 7 8 9 10
    1
    length
    klass
    mark
    4
    8
    64
    4
    address +12 0xc
    compressed
    oop
    << shift
    length = *( r10 * 8‡ + 12 )
    ‡ when object alignment is 8
    length = *( address + 12 )
    address = coop << 3‡
    Base not used for small (<~30G)
    heaps or uncompressed oops

    View Slide

  15. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Array length
    • Getting the length of an array
    2 3 4 5 6 7 8 9 10
    1
    length
    klass
    mark
    4
    8
    64
    4
    address +12 0xc
    compressed
    oop
    length = *( address + 12 )
    address = coop << 3‡ + base
    ‡ when object alignment is 8
    length = *( r10 * 8‡ + r12 + 12 )
    << shift + base
    Base used for large (>~30G)
    heaps with compressed oops

    View Slide

  16. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Overview
    javac
    Source
    Byte
    code
    JVM
    1
    int thing[] = new int[10];
    int sum(int[] thing) {
    int total = 0;
    for(int t : thing) {
    total += t;
    }
    return total;
    }
    0
    3 4
    1
    2

    View Slide

  17. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Bytecode
    • JavaC translates Java to bytecode
    • Stack-based byte oriented code
    • Local vars istore_1
    • Object loads aload_2
    • Array length arraylength
    int sum(int[] thing) {
    int total = 0;
    for(int t : thing) {
    total += t;
    }
    return total;
    }

    View Slide

  18. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Bytecode
    • JavaC translates Java to bytecode
    • Stack-based byte oriented code
    • Local vars istore_1
    • Object loads aload_2
    • Array length arraylength
    0: iconst_0
    1: istore_1
    2: aload_0
    3: astore_2
    4: aload_2
    5: arraylength
    6: istore_3
    7: iconst_0
    8: istore 4
    10: iload 4
    12: iload_3
    13: if_icmpge 33
    16: aload_2
    17: iload 4
    19: iaload
    20: istore 5
    22: iload_1
    23: iload 5
    25: iadd
    26: istore_1
    27: iinc 4,1
    30: goto 10
    33: iload_1
    34: ireturn

    View Slide

  19. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Compilation levels
    • HotSpot uses -XX:+TieredCompilation
    • 0 – interpreter
    • 1 – pure C1
    • 2 – C1 with some profiling
    • 3 – C1 with full profiling
    • 4 – C2 (full optimisation)
    0
    3 4
    1
    2
    Starts off
    interpreted
    '-server' JIT
    level
    '-client' JIT
    levels

    View Slide

  20. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    (Re)compilation
    • Methods get recompiled frequently
    • Use -XX:+PrintCompilation to show them
    • % = on stack replacement, n = native, ! = exception handler
    70 1 n 0 java.lang.System::arraycopy (native) (static)
    71 2 3 java.lang.Object:: (1 bytes)
    73 3 3 java.lang.String::hashCode (55 bytes)
    75 5 3 java.lang.String::charAt (29 bytes)
    76 6 3 java.lang.String::length (6 bytes)
    76 7 3 java.lang.String::indexOf (70 bytes)
    76 4 3 java.lang.Math::min (11 bytes)
    76 9 1 java.lang.Object:: (1 bytes)
    76 2 3 java.lang.Object:: (1 bytes) made not entrant
    Time since
    JVM start Compile
    ID
    Compile
    level
    Or use JITWatch by
    Chris Newland
    @chriswhcodes

    View Slide

  21. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Interpreter
    • Writing an interpreter sounds simple …
    switch(bytecode) {
    case nop: break;
    case aconst_null: push(null); break;
    case iconst_m1: push(-1); break;
    case iconst_0: push(0); break;
    case iconst_1: push(1); break;

    }

    View Slide

  22. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • HotSpot uses‡ a template interpreter
    Runnable[] templateTable = new Runnable[] {
    () -> {}, // nop
    () -> push(null), // aconst_null
    () -> push(-1), // iconst_m1
    () -> push(0), // iconst_0
    () -> push(1), // iconst_1

    } templateTable[bytecodeIndex++].run()
    ‡ this is a Java approximation only

    View Slide

  23. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    HotSpot Disassembler
    • HotSpot has an hsdis plugin to allow dissasembly of code
    • In jdk8u/jdk8u/hotspot repository in src/share/tools/hsdis/
    • Four files to download (don't need to clone whole repository)
    • hsdis.c, hsdis.h, Makefile, README
    • Requires binutils 2.17+ from FSF.org
    • Cannot be redistributed ${COPYRIGHT}
    Look for warning about
    missing 'makeinfo'
    Cygwin needs
    ${MINGW}-ar for AR
    hsdis-
    amd64.dylib
    hsdis-i386.dylib
    https://alblue.bandlem.com/2016/09/javaone-hotspot.html

    View Slide

  24. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Assembly‡ from -XX:+PrintInterpreter
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
    ‡ requires hsdis to be built/installed

    View Slide

  25. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Get address of array into 64-bit rax register
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)

    View Slide

  26. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Load *(address + 12) into 32-bit eax
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)

    View Slide

  27. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Load byte *(r13 + 1) into 32-bit ebx; r13++
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
    r13 is the bytecode index pointer

    View Slide

  28. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Load byte *(r13 + 1) into 32-bit ebx; r13++
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
    r13 is the bytecode index pointer
    Logically equivalent to:
    inc %r13 ; %r13++
    movzbl (%r13), %ebx
    but HotSpot's approach is faster since the naïve implementation would cause a data
    dependency on %r13 between the prior instruction and the subsequent one

    View Slide

  29. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Load table address 0x10…60 into 64-bit r10
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
    0x106293760 is the start of the template table

    View Slide

  30. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Jump to r10 + rbx * 8
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
    rbx is the next bytecode loaded earlier

    View Slide

  31. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Nop instruction (slightly bigger nop)
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
    fills gap until next alignment

    View Slide

  32. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Array length = *(address of object + 0xc)
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)

    View Slide

  33. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Array length = *(address of object + 0xc)
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
    This is the key part of the
    arraylength bytecode

    View Slide

  34. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Template Interpreter
    • Array length = *(address of object + 0xc)
    arraylength 190
    0x00000001068fe9a0: pop %rax
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    0x00000001068fe9a4: movzbl 0x1(%r13),%ebx
    0x00000001068fe9a9: inc %r13
    0x00000001068fe9ac: movabs $0x106293760,%r10
    0x00000001068fe9b6: jmpq *(%r10,%rbx,8)
    0x00000001068fe9ba: nopw 0x0(%rax,%rax,1)
    2 3 4 5 6 7 8 9 10
    1
    length
    klass
    mark
    address +12 0xc
    This is the key part of the
    arraylength bytecode

    View Slide

  35. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Null Checks
    • Null checks are automatically handled
    • The assembly code is generated from:
    void TemplateTable::arraylength() {
    transition(atos, itos);
    __ null_check(rax, arrayOopDesc::length_offset_in_bytes());
    __ movl(rax, Address(rax, arrayOopDesc::length_offset_in_bytes()));
    }
    0x00000001068fe9a1: mov 0xc(%rax),%eax
    If rax is null, *(0+0xc) is a deref of a zero
    page memory location - causes SIGSEGV
    JVM SIGSEGV handler translates
    this to NullPointerException

    View Slide

  36. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    In-line assembler
    • HotSpot code uses __ as a prefix for 'assemble this instruction'
    • #define __ _masm, which is MacroAssembler
    • Generates instructions for current architecture
    • Allows code to vary depending on runtime architecture
    • Take advantage of vector instructions if availbale
    • Optimised OOP decoding and null handling

    View Slide

  37. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Top of Stack
    • It's a little more complicated than that …
    • HotSpot caches top-of-stack in a register
    • Faster access
    • Different register based on type
    • rax – long/int/short/char/byte/boolean
    • xmm0 – double/float
    • Different implementations needed for pop

    View Slide

  38. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Popping off
    pop 87 pop
    0x00000001068f5440: push %rax
    0x00000001068f5441: jmpq 0x00000001068f5470
    0x00000001068f5446: sub $0x8,%rsp
    0x00000001068f544a: vmovss %xmm0,(%rsp)
    0x00000001068f544f: jmpq 0x00000001068f5470
    0x00000001068f5454: sub $0x10,%rsp
    0x00000001068f5458: vmovsd %xmm0,(%rsp)
    0x00000001068f545d: jmpq 0x00000001068f5470
    0x00000001068f5462: sub $0x10,%rsp
    0x00000001068f5466: mov %rax,(%rsp)
    0x00000001068f546a: jmpq 0x00000001068f5470
    0x00000001068f546f: push %rax
    0x00000001068f5470: add $0x8,%rsp
    object
    float
    double
    long
    int
    Entry points for
    different types

    View Slide

  39. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    • The type of value on the top affects entry point
    TemplateTable
    Top of Stack state
    Byte
    code
    Byte Bool Char Short Int Long Float Double Object Void
    array
    length
    X X X X X X X X fe9a0 fe9a0
    pop f546f f546f f546f f546f f546f f546f f5446 f5454 f5440 f5440
    iadd f5920 X f5920 f5920 f5920 X X X X f5920
    ladd X X X X X f5980 X X X f5908
    Entry
    Entry
    Entry
    Entry

    View Slide

  40. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Wide and safepoint
    • Wide extends certain instructions
    • load i -> load ii, fstore i -> fstore ii
    • iinc i -> iinc ii
    • Different table when interpreting 'wide mode'
    • _template_table, _template_table_wide
    • Can be used to implement safepoint
    • Update entry points to use safepoint handler

    View Slide

  41. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Fast bytecodes
    • Some bytecodes are re-written on the fly
    • getfield -> fast_agetfield, fast_igetfield etc.
    • putfield -> fast_aputfield, fast_iputfield etc.
    • iload -> fast_iload
    • aload_0 -> fast_aload_0 aload_0 stores this
    for instances

    View Slide

  42. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Calling Native Methods
    • Code looks up method through vtable
    • klazz.vtable[id].method -> native code
    klass
    vtable
    Method
    Method
    Method nmethod
    • in use
    • not entrant
    • zombie
    • unloaded
    nmethod
    nmethod
    Interpreted
    0
    3
    4
    1 in use
    not entrant
    zombie
    unloaded
    (sub)
    klass
    vtable
    Method
    Method
    in use

    View Slide

  43. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Speeding up
    • 90% of call sites have a single typed target
    • Don't need to go through lookup each time
    klass Method nmethod
    if (klass == ) {
    nmethod();
    } else {
    klass.vtable[id].method.nmethod();
    }
    entry point
    verified entry point
    fallback
    Monomophic dispatch
    optimisation; also aplies to
    bimorphic dispatch

    View Slide

  44. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Verified Entry Point
    • Code has an entry point and verified entry point
    • Entry point is where code starts
    Code:
    [Entry Point]
    # {method} {0x000000011a54b000} 'hashCode' in 'java/lang/String'
    # [sp+0x40] (sp of caller)
    0x00000001067dac80: mov 0x8(%rsi),%r10d
    0x00000001067dac84: shl $0x3,%r10
    0x00000001067dac88: cmp %rax,%r10
    0x00000001067dac8b: jne 0x000000010671fb60 ; {runtime_call}
    0x00000001067dac91: data32 data32 nopw 0x0(%rax,%rax,1)
    0x00000001067dac9c: data32 data32 xchg %ax,%ax
    [Verified Entry Point]
    rsi is the String
    instance
    rsi+8 is klass
    shl 3 == *8
    Expanding compressed
    klass oop
    rax is the expected
    type (String)
    Fall back (to interpreter)
    if not valid

    View Slide

  45. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Verified Entry Point
    • Code has an entry point and verified entry point
    • Verified Entry point is where type holds
    Code:
    [Entry Point]

    [Verified Entry Point]
    0x00000001067daca0: mov %eax,-0x14000(%rsp)
    0x00000001067daca7: push %rbp
    0x00000001067daca8: sub $0x30,%rsp
    0x00000001067dacac: movabs $0x11a70ccb0,%rax
    ; {metadata(method data for {method}
    ; {0x000000011a54b000} 'hashCode' '()I' in 'java/lang/String')}
    0x00000001067dacb6: mov 0xdc(%rax),%edi
    0x00000001067dacbc: add $0x8,%edi
    0x00000001067dacbf: mov %edi,0xdc(%rax)
    Profiling data for
    String's hashCode
    Stack banging/
    StackOverflowError
    Adding 1 to the number
    of times called
    Bottom 3
    bits are
    used for
    flags

    View Slide

  46. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Safepoint and return
    • Stack is reset to its original position
    • Safepoint test of protected page
    • Function returns

    0x00000001067f1997: callq 0x0000000110b27d60
    ; {optimized virtual_call}
    0x00000001067f199c: movslq %eax,%rax
    0x00000001067f199f: add $0x30,%rsp
    0x00000001067f19a3: pop %rbp
    0x00000001067f19a4: test %eax,-0x20738aa(%rip)
    ; {poll_return}
    0x00000001067f19aa: retq
    Safepoint poll; when JVM
    wants to do e.g. GC it marks
    this page as non-readable
    which causes a fault that is
    handled by the JVM
    Static functions can be called directly

    View Slide

  47. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Intrinsics
    • Thread.currentThread()
    • System.arraycopy()
    • Object.clone()
    • System.nanoTime(), currentTimeMillis()
    • String.indexOf()
    • Math.*
    r15 is used for the
    thread instance
    Multiple implementations
    based on generic type
    Call out to OS provided native
    libraries

    View Slide

  48. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    [Entry Point]
    [Verified Entry Point]
    # {method} {0x0000000126ea2468} 'getId' '()J'
    0x00010e24de40: sub $0x18,%rsp
    0x00010e24de47: mov %rbp,0x10(%rsp)
    ; - Test::getId@-1 (line 10)
    0x00010e24de4c: mov 0x1b8(%r15),%r10
    ;*invokestatic currentThread
    ; - Test::getId@0 (line 10)
    0x00010e24de53: mov 0x28(%r10),%rax
    ;*getfield tid
    ; - java.lang.Thread::getId@1 (line 1702)
    ; - Test::getId@3 (line 10)
    0x00010e24de57: add $0x10,%rsp
    0x00010e24de5b: pop %rbp
    0x00010e24de5c: test %eax,-0x2051e62(%rip)
    0x00010e24de62: retq
    Thread.currentThread()
    • No method invocation needed
    public class Test {
    private URI uri;
    public static void main() {
    long t = 0
    for(int i = 0; i < 20_000; i++) {
    t += getId();
    }
    System.out.println("Total: " + t);
    }
    public static long getId() {
    return Thread.currentThread().getId();
    }
    }
    Current thread is stored in r15
    Field at position 0x28 is tid
    0x28 == 40

    View Slide

  49. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    [Entry Point]
    [Verified Entry Point]
    # {method} {0x0000000126ea2468} 'getId' '()J'
    0x00010e24de40: sub $0x18,%rsp
    0x00010e24de47: mov %rbp,0x10(%rsp)
    ; - Test::getId@-1 (line 10)
    0x00010e24de4c: mov 0x1b8(%r15),%r10
    ;*invokestatic currentThread
    ; - Test::getId@0 (line 10)
    0x00010e24de53: mov 0x28(%r10),%rax
    ;*getfield tid
    ; - java.lang.Thread::getId@1 (line 1702)
    ; - Test::getId@3 (line 10)
    0x00010e24de57: add $0x10,%rsp
    0x00010e24de5b: pop %rbp
    0x00010e24de5c: test %eax,-0x2051e62(%rip)
    0x00010e24de62: retq
    Thread.currentThread()
    • No method invocation needed
    public class Test {
    private URI uri;
    public static void main() {
    long t = 0
    for(int i = 0; i < 20_000; i++) {
    t += getId();
    }
    System.out.println("Total: " + t);
    }
    public static long getId() {
    return Thread.currentThread().getId();
    }
    }
    Current thread is stored in r15
    Field at position 0x28 (40) is tid
    java.lang.Thread object internals:
    OFFSET SIZE TYPE DESCRIPTION VALUE
    0 4 (object header) 01 00 00 00
    4 4 (object header) 00 00 00 00
    8 4 (object header) d6 0c 00 f8
    12 4 int Thread.priority 5
    16 8 long Thread.eetop 0
    24 8 long Thread.stackSize 0
    32 8 long Thread.nativeParkEventPointer 0
    40 8 long Thread.tid 13
    48 4 int Thread.threadStatus 0
    52 1 boolean Thread.single_step false
    53 1 boolean Thread.daemon false
    54 1 boolean Thread.stillborn false
    55 1 (alignment/padding gap) N/A
    56 4 String Thread.name (object)
    .. . ... ...........
    244 4 int Thread.threadLocalRandomSecondarySeed 0
    248 128 (loss due to the next object alignment)
    Instance size: 376 bytes
    Space losses: 129 bytes internal + 128 bytes external = 257 bytes total
    0x28 == 40

    View Slide

  50. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    [Entry Point]
    [Verified Entry Point]
    # {method} {0x0000000126ea2468} 'getId' '()J'
    0x00010e24de40: sub $0x18,%rsp
    0x00010e24de47: mov %rbp,0x10(%rsp)
    ; - Test::getId@-1 (line 10)
    0x00010e24de4c: mov 0x1b8(%r15),%r10
    ;*invokestatic currentThread
    ; - Test::getId@0 (line 10)
    0x00010e24de53: mov 0x28(%r10),%rax
    ;*getfield tid
    ; - java.lang.Thread::getId@1 (line 1702)
    ; - Test::getId@3 (line 10)
    0x00010e24de57: add $0x10,%rsp
    0x00010e24de5b: pop %rbp
    0x00010e24de5c: test %eax,-0x2051e62(%rip)
    0x00010e24de62: retq
    Thread.currentThread()
    • No method invocation needed
    public class Test {
    private URI uri;
    public static void main() {
    long t = 0
    for(int i = 0; i < 20_000; i++) {
    t += getId();
    }
    System.out.println("Total: " + t);
    }
    public static long getId() {
    return Thread.currentThread().getId();
    }
    }
    Current thread is stored in r15
    Field at position 0x28 (40) is tid
    java.lang.Thread object internals:
    OFFSET SIZE TYPE DESCRIPTION VALUE
    0 4 (object header) 01 00 00 00
    4 4 (object header) 00 00 00 00
    8 4 (object header) d6 0c 00 f8
    12 4 int Thread.priority 5
    16 8 long Thread.eetop 0
    24 8 long Thread.stackSize 0
    32 8 long Thread.nativeParkEventPointer 0
    40 8 long Thread.tid 13
    48 4 int Thread.threadStatus 0
    52 1 boolean Thread.single_step false
    53 1 boolean Thread.daemon false
    54 1 boolean Thread.stillborn false
    55 1 (alignment/padding gap) N/A
    56 4 String Thread.name (object)
    .. . ... ...........
    244 4 int Thread.threadLocalRandomSecondarySeed 0
    248 128 (loss due to the next object alignment)
    Instance size: 376 bytes
    Space losses: 129 bytes internal + 128 bytes external = 257 bytes total
    0x28 == 40
    // The following three initially uninitialized fields are exclusively
    // managed by class java.util.concurrent.ThreadLocalRandom. These
    // fields are used to build the high-performance PRNGs in the
    // concurrent code, and we can not risk accidental false sharing.
    // Hence, the fields are isolated with @Contended.
    /** The current seed for a ThreadLocalRandom */
    @jdk.internal.vm.annotation.Contended("tlr")
    long threadLocalRandomSeed;
    /** Probe hash value; nonzero if threadLocalRandomSeed initialized */
    @jdk.internal.vm.annotation.Contended("tlr")
    int threadLocalRandomProbe;
    /** Secondary seed isolated from public ThreadLocalRandom sequence */
    @jdk.internal.vm.annotation.Contended("tlr")
    int threadLocalRandomSecondarySeed;
    Was sun.misc.Contended

    View Slide

  51. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Optimisations
    • HotSpot can optimise code by
    • Method inlining
    • Dead code/path elimination
    • Heuristics for optimising call sites
    • Constant folding
    • C2 performs additional optimisations (escape analysis etc)

    View Slide

  52. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    Summary
    • Objects on heap have a mark word and a klass pointer
    • Compressed OOPS result in smaller memory for <32 Gb heaps
    • Objects may have internal or external padding waste due to alignment
    • Execution begins with template interpreter, level 0
    • JIT compilation triggered with levels 1-4; native methods are retired
    • Monomorphic and bimorphic methods have optimised call sites
    • Intrinsic methods don't pay method call costs

    View Slide

  53. Copyright (c) 2016, Alex Blewitt, Bandlem Ltd
    JavaOne 2016
    HotSpot
    Under the Hood
    Alex Blewitt
    @alblue
    https://speakerdeck.com/alblue/
    https://alblue.bandlem.com/2016/09/javaone-hotspot.html

    View Slide