Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[SnowOne 2023] Xie Junfeng: Overview of OpenJDK RISC-V port

jugnsk
March 17, 2023

[SnowOne 2023] Xie Junfeng: Overview of OpenJDK RISC-V port

We'll introduce our work of porting openjdk to riscv, especially our work to support instruction set extensions.

We will also introduce the application of Pointer Masking in ZGC.

jugnsk

March 17, 2023
Tweet

More Decks by jugnsk

Other Decks in Programming

Transcript

  1. Overview of Openjdk RISC-V Port
    Xie Junfeng
    Huawei Software Developer
    [email protected]

    View Slide

  2. Huawei Proprietary - Restricted Distribution
    2
    Content
    1. Introduction
    2. No Flag Register
    3. Zifencei: FENCE.I
    4. Vector Extension
    5. Bitmanipulation Extension
    6. JIT Extension: Pointer Masking
    7. Summary

    View Slide

  3. Huawei Proprietary - Restricted Distribution
    3
    Content
    1. Introduction
    2. No Flag Register
    3. Zifencei: FENCE.I
    4. Vector Extension
    5. Bitmanipulation Extension
    6. JIT Extension: Pointer Masking
    7. Summary

    View Slide

  4. Huawei Proprietary - Restricted Distribution
    4
    Introduction
    • About Me
    > Xie Junfeng
    > Fudan University
    > Huawei Software Developer
    > Mainly participate in RISC-V 32 Zero & TBI (Pointer Masking)
    > [email protected]

    View Slide

  5. Huawei Proprietary - Restricted Distribution
    5
    Introduction
    • Our Team
    > the owner of OpenJDK RISC-V port
    > https://github.com/openjdk/riscv-port
    • Key Members
    > [email protected]
    > [email protected] (resigned)
    > [email protected]

    View Slide

  6. Huawei Proprietary - Restricted Distribution
    6
    Introduction
    • RISC-V
    > an open and free instruction set architecture
    > modular, collaborative
    • Huawei, OpenJDK and RISC-V
    > a lot of java services
    > adapt to Huawei-developed hardware
    > RISC-V is free and open
    > further exploration based on RISC-V

    View Slide

  7. Huawei Proprietary - Restricted Distribution
    7
    Introduction
    • Our Work on OpenJDK mainline
    > Instruction Sets: RV64IMAFDCV, Zba, Zbb, Zicsr, Zifencei
    > Zero, Template Interpreter, C1, C2
    > All GC algorithms (including ZGC and Shenandoah)
    > RV32 Zero Interpreter
    > JEP 425: Virtual Thread (JDK 20)
    > JEP 424: Foreign Function & Memory API (JDK 21)
    > Generational ZGC (ZGC repo)
    • Our Plan
    > backport JDK 8/11/17 (in progress)
    > JEP 426: Vector API

    View Slide

  8. Huawei Proprietary - Restricted Distribution
    8
    Introduction
    • Enable Extensions Manually
    • Check Extensions at Runtime
    product(bool, UseRVC, false, "Use RVC instructions") \
    product(bool, UseRVA22U64, false, EXPERIMENTAL, "Use RVA22U64 profile") \
    product(bool, UseRVV, false, EXPERIMENTAL, "Use RVV instructions") \
    product(bool, UseZba, false, EXPERIMENTAL, "Use Zba instructions") \
    product(bool, UseZbb, false, EXPERIMENTAL, "Use Zbb instructions") \
    product(bool, UseZbs, false, EXPERIMENTAL, "Use Zbs instructions") \
    product(bool, UseZfhmin, false, EXPERIMENTAL, "Use Zfhmin instructions") \
    product(bool, UseZic64b, false, EXPERIMENTAL, "Use Zic64b instructions") \
    product(bool, UseZicbom, false, EXPERIMENTAL, "Use Zicbom instructions") \
    product(bool, UseZicbop, false, EXPERIMENTAL, "Use Zicbop instructions") \
    product(bool, UseZicboz, false, EXPERIMENTAL, "Use Zicboz instructions") \
    if (UseRVV) {
    if (!(_features & CPU_V)) {
    warning("RVV is not supported on this CPU");
    FLAG_SET_DEFAULT(UseRVV, false);
    } else { ... }
    }

    View Slide

  9. Huawei Proprietary - Restricted Distribution
    9
    413
    53
    0
    50
    100
    150
    200
    250
    300
    350
    400
    450
    max-JOPS critical-JOPS
    OpenJDK
    SPECjbb2015
    0%
    2000%
    4000%
    6000%
    8000%
    10000%
    12000%
    OpenJDK RISCV64 Server JDK vs Zero JDK
    Zero JDK Server JDK
    SPECjvm2008 SPECjbb2015[2]
    [1] Data source from https://twitter.com/shipilev/status/1479179438625595399, on Hifive Unmatched Board
    [2] Hifive Unleashed Board, data provided by PLCT
    39x faster(GEOMEAN)than Zero JDK[1]
    Introduction

    View Slide

  10. Huawei Proprietary - Restricted Distribution
    10
    Content
    1. Introduction
    2. No Flag Register
    3. Zifencei: FENCE.I
    4. Vector Extension
    5. Bitmanipulation Extension
    6. JIT Extension: Pointer Masking
    7. Summary

    View Slide

  11. Huawei Proprietary - Restricted Distribution
    11
    void LIR_List::set_cmp_oprs(LIR_Op* op) {
    switch (op->code()) {
    case lir_cmp:
    _cmp_opr1 = op->as_Op2()->in_opr1();
    _cmp_opr2 = op->as_Op2()->in_opr2();
    break;
    case lir_branch: // fall through
    case lir_cond_float_branch:
    if (op->as_OpBranch()->cond() != lir_cond_always) {
    op->as_Op2()->set_in_opr1(_cmp_opr1);
    op->as_Op2()->set_in_opr2(_cmp_opr2);
    }
    break;
    case lir_cmove:
    op->as_Op4()->set_in_opr3(_cmp_opr1);
    op->as_Op4()->set_in_opr4(_cmp_opr2);
    break;
    case ...:
    ...;
    }
    }
    No Flag Register
    • Compare & Jump in one instruction
    > so there is no flag register
    > in RISC-V: beq(op1, op2, label)
    > in other arch: cmp(op1, op2) & beq(label)
    • C1: save oprs at cmp & consume them at
    branch
    • C2: use t1 as the flag register
    do nothing but
    save the two
    operators
    consume the
    two operators
    // On riscv, the physical flag register is missing, so we
    // use t1 instead, to bridge the RegFlag semantics in
    // share/opto.
    reg_def RFLAGS (SOC, SOC, Op_RegFlags, 6, x6->as_VMReg());
    Our solution on C1
    Our solution on C2

    View Slide

  12. Huawei Proprietary - Restricted Distribution
    12
    Content
    1. Introduction
    2. No Flag Register
    3. Zifencei: FENCE.I
    4. Vector Extension
    5. Bitmanipulation Extension
    6. JIT Extension: Pointer Masking
    7. Summary

    View Slide

  13. Huawei Proprietary - Restricted Distribution
    13
    Zifencei: FENCE.I
    • RISC-V Weak Memory Ordering (RVWMO)
    • Hart (short for hardware thread)
    > not software-managed thread contexts
    > similar to the "logical core" on other ISAs
    > a Java thread is not bound to a specific hart
    • FENCE.I
    > visibility of the modified instruction
    > only for current hart!
    (thread A on hart 1)
    modify code
    fence.i
    (thread A rescheduled to hart 2)
    execute code (old code)

    View Slide

  14. Huawei Proprietary - Restricted Distribution
    14
    Zifencei: FENCE.I
    • Syscall: flush_icache
    > do not use fence.i at user level
    > fence.i to all harts via inter-processor interrupt
    > so no fence.i before the relocatable code
    // Hart 1 (read hart)
    void MacroAssembler::emit_static_call_stub() {
    ...
    // ifence(); // mov_metadata(xmethod, (Metadata*)NULL); // ...
    }
    // Hart 2 (write hart)
    void NativeMovConstReg::set_data(intptr_t x) {
    ...
    // Store x into the instruction stream.
    MacroAssembler::pd_patch_instruction_size(instruction_address(), (address)x); // ICache::invalidate_range(instruction_address(), movptr_instruction_size); // ...
    }
    modify code
    (flush_icache syscall)
    fence
    fence.i
    ipi: all harts invoke fence.i
    continue

    View Slide

  15. Huawei Proprietary - Restricted Distribution
    15
    Content
    1. Introduction
    2. No Flag Register
    3. Zifencei: FENCE.I
    4. Vector Extension
    5. Bitmanipulation Extension
    6. JIT Extension: Pointer Masking
    7. Summary

    View Slide

  16. Huawei Proprietary - Restricted Distribution
    16
    Vector Extension
    • Superword Level Parallelism (SLP)
    > hotspot supports SLP for auto-vectorization
    • Single Instruction Multiple Data (SIMD)
    > x86, ARM, MIPS...
    > incremental design
    > must be coded for data width
    • RISC-V Vector
    > generates fewer instructions than SIMD
    > hides implementation details
    > uses vsetvl/vsetivli/vsetvli instructions to set vector
    width for vector instructions
    instruct vaddB(vReg dst, vReg src1, vReg src2) %{
    ...
    uint length_in_bytes = Matcher::vector_length_in_bytes(this);
    if (VM_Version::use_neon_for_vector(length_in_bytes)) {
    __ addv($dst$$FloatRegister, get_arrangement(this),
    $src1$$FloatRegister, $src2$$FloatRegister);
    } else {
    assert(UseSVE > 0, "must be sve");
    __ sve_add($dst$$FloatRegister, __ B,
    $src1$$FloatRegister, $src2$$FloatRegister);
    }
    ...
    %}
    instruct vaddB(vReg dst, vReg src1, vReg src2) %{
    ...
    __ rvv_vsetvli(T_BYTE,
    Matcher::vector_length_in_bytes(this));
    __ vadd_vv(as_VectorRegister($dst$$reg),
    as_VectorRegister($src1$$reg),
    as_VectorRegister($src2$$reg));
    ...
    %}
    Aarch64
    RISC-V

    View Slide

  17. Huawei Proprietary - Restricted Distribution
    17
    Vector Extension
    • Todo
    > eliminate redundancy of "vsetvli" instruction
    equipped with each vector operation.
    - peephole optimization
    - add a vector length node and analysis the control
    flow and data flow
    • Performance
    > no hardware, just qemu
    > so no performance data yet
    instruct vaddB(vReg dst, vReg src1, vReg src2) %{
    ...
    __ rvv_vsetvli(T_BYTE,
    Matcher::vector_length_in_bytes(this));
    __ vadd_vv(as_VectorRegister($dst$$reg),
    as_VectorRegister($src1$$reg),
    as_VectorRegister($src2$$reg));
    ...
    %}

    View Slide

  18. Huawei Proprietary - Restricted Distribution
    18
    Content
    1. Introduction
    2. No Flag Register
    3. Zifencei: FENCE.I
    4. Vector Extension
    5. Bitmanipulation Extension
    6. JIT Extension: Pointer Masking
    7. Summary

    View Slide

  19. Huawei Proprietary - Restricted Distribution
    19
    Bitmanipulation Extension
    • Zba, Zbb, Part of Zbs
    • Reduce code size
    public static long reverseBytes(long i) {
    i = (i & 0x00ff00ff00ff00ffL) << 8 | (i >>> 8) &
    0x00ff00ff00ff00ffL;
    return (i << 48) | ((i & 0xffff0000L) << 16) |
    ((i >>> 16) & 0xffff0000L) | (i >>> 48);
    }
    lui t2,0xff0
    addiw t2,t2,255
    slli t2,t2,0x10
    addi t2,t2,255 # 0xff00ff
    slli t2,t2,0x10
    addi t2,t2,255
    and t3,a1,t2
    srli t4,a1,0x8
    and t2,t4,t2
    slli t3,t3,0x8
    or t2,t3,t2
    lui t3,0x10
    addiw t3,t3,-1
    slli t3,t3,0x10
    and t4,t2,t3
    srli t6,t2,0x10
    slli t4,t4,0x10
    slli t5,t2,0x30
    and t3,t6,t3
    or t4,t5,t4
    srli t2,t2,0x30
    or t3,t4,t3
    or a0,t3,t2
    revb8

    View Slide

  20. Huawei Proprietary - Restricted Distribution
    20
    Bitmanipulation Extension
    • Zba, Zbb, Part of Zbs
    • Reduce code size
    public static void main(String[] args) {
    int mylist3[] = {1,2,3,4,5,6,7,8};
    int mylist4[] = {8,7,6,5,4,3,2,1};
    int base1 = 2;
    int base2 = 3;
    for (int i = 0; i < 1000000; i++) {
    mylist3[base1] = i;
    mylist4[base2] = i;
    }
    }
    addw t4,s2,zero
    addw t2,s4,zero
    slli t4,t4,0x2
    slli t2,t2,0x2
    add t4,t4,a2 ;*iastore {reexecute=0 rethrow=0 return_oop=0}
    ; - Shadd::[email protected] (line 8)
    add t2,t2,s3 ;*iastore {reexecute=0 rethrow=0 return_oop=0}
    ; - Shadd::[email protected] (line 9)
    sw s1,16(t4) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
    ; - Shadd::[email protected] (line 8)
    sw s1,16(t2) ;*iload {reexecute=0 rethrow=0 return_oop=0}
    ; - Shadd::[email protected] (line 7)
    sh2add a0,s2,t2 ;*iastore {reexecute=0 rethrow=0 return_oop=0}
    ; - Shadd::[email protected] (line 8)
    sh2add t6,s4,s3 ;*iastore {reexecute=0 rethrow=0 return_oop=0}
    ; - Shadd::[email protected] (line 9)
    sw s1,16(a0) ;*iastore {reexecute=0 rethrow=0 return_oop=0}
    ; - Shadd::[email protected] (line 8)
    sw s1,16(t6) ;*iload {reexecute=0 rethrow=0 return_oop=0}
    ; - Shadd::[email protected] (line 7)asm
    sh2add

    View Slide

  21. Huawei Proprietary - Restricted Distribution
    21
    Bitmanipulation Extension
    • Code Size Reduction on SPECjvm2008
    > bitmanip-v1.0.0-rc[1] for OpenJDK
    > geometric mean code size reduction: 2.9%
    > again, no hardware, just qemu
    [1] https://github.com/riscv/riscv-bitmanip/releases/tag/1.0.0
    benchmark compress crypto derby mpegaudio scimark serial sunflow xml
    code size
    reduction (%)
    2.81% 7.49% 1.98% 2.54% 6.18% 1.85% 2.87% 1.46%

    View Slide

  22. Huawei Proprietary - Restricted Distribution
    22
    Content
    1. Introduction
    2. No Flag Register
    3. Zifencei: FENCE.I
    4. Vector Extension
    5. Bitmanipulation Extension
    6. JIT Extension: Pointer Masking
    7. Summary

    View Slide

  23. Huawei Proprietary - Restricted Distribution
    23
    JIT Extension: Pointer Masking
    • RISC-V Pointer Masking
    > causes the MMU to ignore the top N bits of the
    effective address
    > the proposal is not released yet
    • Tagged Pointer
    > folding a few bits as additional data into a pointer
    (GC state, data type, etc)
    > save memory
    > reduce a load
    > overhead of dereference
    Tagged pointer of Objective-C
    Value tagging of Javascript V8
    Colored pointer of ZGC[1]
    [1] The max heap of ZGC can be 4TB, 8TB or 16TB. The colored pointer varies depending on the max heap. Here we only consider the 4TB case.

    View Slide

  24. Huawei Proprietary - Restricted Distribution
    24
    JIT Extension: Pointer Masking
    • Implementations of Tagged Pointer
    tagged pointer
    software
    right shift
    AND a mask
    multi mapping
    linux: mmap()
    windows: CreateFileMapping()
    extra instruction
    dTLB load misses
    ZGC

    View Slide

  25. Huawei Proprietary - Restricted Distribution
    25
    JIT Extension: Pointer Masking
    • OpenJDK17 ZGC Colored Pointer
    > 4 colored bits (42nd~45th)
    > only three situations: 100, 010, 001 [2]
    > multi mapping (map to 3 locations)
    > dTLB load misses
    [1] http://cr.openjdk.java.net/~pliden/slides/ZGC-PLMeetup-2019.pdf
    [2] Only three of the four bits are used. The 'finalizable' bit is not considered in Multi Mapping.
    [1]

    View Slide

  26. Huawei Proprietary - Restricted Distribution
    26
    JIT Extension: Pointer Masking
    • Implementations of Tagged Pointer
    tagged pointer
    software
    right shift
    AND a mask
    multi mapping
    linux: mmap()
    windows: CreateFileMapping()
    hardware
    sparc: virtual address mask
    armv8: top byte ignore
    risc-v: pointer masking
    extra instruction
    dTLB load misses
    no cross-platform
    ZGC
    what we want
    to do

    View Slide

  27. Huawei Proprietary - Restricted Distribution
    27
    JIT Extension: Pointer Masking
    • ArmV8 Top Byte Ignore (TBI)
    > ignore the most significant 8 bits of the virtual
    address
    > similar to Pointer Masking
    • Layout of Colored Pointer with TBI
    > move colored bits from 42~45 to 59~62
    > (the 63rd bit has been occupied by
    StackWatermarkState)
    // Multi-Mapping
    +--------------------+----+-----------------------------------------------+
    |00000000 00000000 00|1111|11 11111111 11111111 11111111 11111111 11111111|
    +--------------------+----+-----------------------------------------------+
    | | * 41-0 Object Offset (42-bits, 4TB address space)
    | * 45-42 Metadata Bits (4-bits) 0001 = Marked0
    * 63-46 Fixed (18-bits, always zero) 0010 = Marked1
    0100 = Remapped
    1000 = Finalizable
    // TBI
    ++----+-------------------+-----------------------------------------------+
    0|1111|000 00000000 000000|11 11111111 11111111 11111111 11111111 11111111|
    ++----+-------------------+-----------------------------------------------+
    | | * 41-0 Object Offset (42-bits, 4TB address space)
    | * 58-42 Fixed (18-bits, always zero)
    * 62-59 Metadata Bits (4-bits) 0001 = Marked0
    0010 = Marked1
    0100 = Remapped
    1000 = Finalizable
    ignored

    View Slide

  28. Huawei Proprietary - Restricted Distribution
    28
    JIT Extension: Pointer Masking
    • Performance
    > BiShengJDK17, Kunpeng platform[1]
    > SPECjbb2015 ↗7~8%
    > dTLB-load-misses/(load+store) ↘23.66%
    • Open source in BiSheng JDK17[2]
    > -XX:+UseTBI[3]
    [1] Experiment results may vary on different hardware. Our environment: BiShengJDK17, kunpeng 920,
    128 cpus, 500GB memory, SPECjbb2015 multi-jvmmode
    [2] https://gitee.com/openeuler/bishengjdk-17/pulls/27
    [3] We later implemented hot patching of Generational ZGC in UseTBI, which further improved performance.
    The data above does not include the improvement of hot patching.
    0 20000 40000 60000 80000 100000 120000 140000 160000
    max
    jOPS
    critical
    jOPS
    SPECjbb2015 - Score
    Multi Mapping TBI
    0,00% 1,00% 2,00% 3,00% 4,00% 5,00% 6,00%
    TBI
    Multi Mapping
    SPECjbb2015 - dTLB-load-miss/(load+store)

    View Slide

  29. Huawei Proprietary - Restricted Distribution
    29
    JIT Extension: Pointer Masking
    • Generational ZGC
    > planned to be released in JDK 21 to adapt to
    applications with high allocation rates
    > pointer layout
    // A zpointer is a combination of the address bits (heap base bit + offset)
    // and two low-order metadata bytes, with the following layout:
    // |48 bits VA|RRRRMMmmFFrr0000
    // **** : Used by load barrier
    // ********** : Used by mark barrier
    // ************ : Used by store barrier
    // **** : Reserved bits
    // The table below describes what each color does.
    // +-------------+-------------------+--------------------------+
    // | Bit pattern | Description | Included colors |
    // +-------------+-------------------+--------------------------+
    // | rr | Remembered bits | Remembered[0, 1] |
    // +-------------+-------------------+--------------------------+
    // | FF | Finalizable bits | Finalizable[0, 1] |
    // +-------------+-------------------+--------------------------+
    // | mm | Marked young bits | MarkedYoung[0, 1] |
    // +-------------+-------------------+--------------------------+
    // | MM | Marked old bits | MarkedOld[0, 1] |
    // +-------------+-------------------+--------------------------+
    // | RRRR | Remapped bits | Remapped[00, 01, 10, 11] |
    // +-------------+-------------------+--------------------------+
    ZLoadBarrierStubC2* const stub =
    ZLoadBarrierStubC2::create(node, ref_addr, ref);
    Label good;
    __ relocate(barrier_Relocation::spec(),
    ZBarrierRelocationFormatLoadGoodBeforeTbz);
    __ tbz(ref, barrier_Relocation::unpatched, good);
    __ b(*stub->entry());
    __ bind(good);
    __ lsr(ref, ref, ZPointerLoadShift); // uncolor
    __ bind(*stub->continuation());
    ~3% in SPECjbb2015

    View Slide

  30. Huawei Proprietary - Restricted Distribution
    30
    JIT Extension: Pointer Masking
    • Implementations of Tagged Pointer
    tagged pointer
    software
    right shift
    AND a mask
    multi mapping
    linux: mmap()
    windows: CreateFileMapping()
    hardware
    sparc: virtual address mask
    armv8: top byte ignore
    risc-v: pointer masking
    extra instruction
    dTLB load misses
    no cross-platform
    Generational ZGC
    ZGC
    what we want
    to do

    View Slide

  31. Huawei Proprietary - Restricted Distribution
    31
    Content
    1. Introduction
    2. No Flag Register
    3. Zifencei: FENCE.I
    4. Vector Extension
    5. Bitmanipulation Extension
    6. JIT Extension: Pointer Masking
    7. Summary

    View Slide

  32. Huawei Proprietary - Restricted Distribution
    32
    Summary
    • Introduced the work we have done to port OpenJDK to RISC-V
    > Focused on RISC-V Pointer Masking and introduced an experiment based on TBI
    • Huawei BishengJDK
    > https://gitee.com/openeuler/bishengjdk-8
    > https://gitee.com/openeuler/bishengjdk-11
    > https://gitee.com/openeuler/bishengjdk-17
    • OpenJDK RISC-V Binary
    > https://builds.shipilev.net/openjdk-jdk/

    View Slide

  33. Copyright©2018 Huawei Technologies Co., Ltd.
    All Rights Reserved.
    The information in this document may contain predictive
    statements including, without limitation, statements regarding
    the future financial and operating results, future product
    portfolio, new technology, etc. There are a number of factors that
    could cause actual results and developments to differ materially
    from those expressed or implied in the predictive statements.
    Therefore, such information is provided for reference purpose
    only and constitutes neither an offer nor an acceptance. Huawei
    may change the information at any time without notice.
    把数字世界带入每个人、每个家庭、
    每个组织,构建万物互联的智能世界。
    Bring digital to every person, home and
    organization for a fully connected,
    intelligent world.
    Thank you.

    View Slide