Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Register-based calling convention for Go functions

Register-based calling convention for Go functions

在 Go 1.18 版本中,register-based calling convention 正式實作進主流架構 (64-bit ARM & x86),而此改善有效地提升 Go 10% 以上的效能。本次議程將介紹 Go 從原先 stack-based 轉換到 register-based calling convention 的轉變過程和兩個 calling convention 的差異。

Cherie Hsieh

August 02, 2022
Tweet

Other Decks in Programming

Transcript

  1. Register-based calling convention
    for Go functions
    Cherie Hsieh @ TSMC

    View Slide

  2. Outline
    1. Introduce to calling convention
    2. Register-based v.s Stack-based calling convention
    3. Switch to a register-based calling convention
    4. Performance benchmark

    View Slide

  3. Introduce to calling convention
    Calling convention is a part of Application Binary Interface (ABI), it
    defines how subroutines receive parameters from their caller and how
    they return a result.
    https://en.wikipedia.org/wiki/Calling_convention

    View Slide

  4. Introduce to calling convention
    0x30 (code address)
    func main() {
    price := calcPrice(10, 1)
    }
    0x20
    func calcPrice(price int, tax int) int {
    res := price + tax
    return res
    }
    send parameters
    return the result
    1
    2

    View Slide

  5. Introduce to calling convention
    CPU provider
    Operating System
    Compiler
    calling convention guide
    implement the calling convention
    extend the calling convention
    for specific languages

    View Slide

  6. Introduce to calling convention
    RISC-V

    View Slide

  7. Register-based v.s
    Stack-based calling convention

    View Slide

  8. Register-based calling convention
    func add(a int, b int) int {
    c := a + b
    return c
    }
    func main() {
    number1 := 2
    number2 := 3
    result := add(number1, number2)
    }
    MOVD $2, R0
    MOVD $3, R1
    CALL "".add(SB)
    ADD R1, R0, R0
    RET (R30)
    R: register

    View Slide

  9. Stack-based calling convention
    func add(a int, b int) int {
    c := a + b
    return c
    }
    func main() {
    number1 := 2
    number2 := 3
    result := add(number1, number2)
    }
    MOVD $2, R0
    MOVD R0, 8(RSP)
    MOVD $3, R0
    MOVD R0, 16(RSP)
    CALL "".add(SB)
    MOVD 24(RSP), R0
    MOVD "".a(FP), R0
    MOVD "".b+8(FP), R1
    ADD R1, R0, R0
    MOVD R0, "".~r2+16(FP)
    RET (R30)

    View Slide

  10. Calling conventions of different languages
    Register-based calling conventions
    1. C / C++ (GNU or LLVM compiler)
    2. Rust (LLVM-based compiler)
    3. Java (JIT-compiled)
    Stack-based calling conventions
    1. Python
    2. Java (interpreter)

    View Slide

  11. Switch to a register-based calling convention

    View Slide

  12. Switch to a register-based calling convention
    Discussion started on Aug/12/2020 (go 1.15)
    Why Go use a stacked-based calling convention before go 1.17
    1. All platforms can use essentially the same conventions
    2. Simplify the implementation of loacal variable allocation
    3. Simplify the stack tracing for garbage collection and stack growth
    Drawbacks
    It leaves a lot of performance on the table.

    View Slide

  13. Switch to a register-based calling convention
    Advantages of stacked-based calling convention
    accessing arguments in registers is still roughly 40% faster than
    accessing arguments on the stack (main memory).
    Drawbacks
    1. It would introduce additional compile time to allocate registers.
    2. Increasing the design compelxity of compiler

    View Slide

  14. Switch to a register-based calling convention
    Supported Architectures
    - Golang v1.17 64-bit x86 architecture
    - Golang v1.18 64-bit ARM and 64-bit PowerPC
    - Golang v1.19 riscv64

    View Slide

  15. Performance benchmark

    View Slide

  16. Performance benchmark
    func fib(n int) int {
    if n > 1 {
    return fib(n - 1) + fib(n - 2)
    }
    return n
    }
    func main() {
    n := 50
    _ = fib(n)
    }
    MOVD $50, R0
    MOVD R0, 8(RSP)
    PCDATA $1, ZR
    CALL "".fib(SB)
    # if n > 1
    MOVD "".n(FP), R0
    CMP $1, R0
    BLE fib_pc104
    # fib(n - 1)
    SUB $1, R0, R1
    MOVD R1, 8(RSP)
    PCDATA $1, ZR
    CALL "".fib(SB)
    MOVD 16(RSP), R0
    MOVD R0, ""..autotmp_4-8(SP)
    # fib(n - 2)
    MOVD "".n(FP), R1
    SUB $2, R1, R1
    MOVD R1, 8(RSP)
    CALL "".fib(SB)
    MOVD 16(RSP), R0
    MOVD ""..autotmp_4-8(SP), R1
    # fib(n - 1) + fib(n - 2)
    ADD R0, R1, R0
    MOVD R0, "".~r1+8(FP)
    MOVD -8(RSP), R29
    MOVD.P 48(RSP), R30
    RET (R30)
    Go v1.17

    View Slide

  17. Performance benchmark
    func fib(n int) int {
    if n > 1 {
    return fib(n - 1) + fib(n - 2)
    }
    return n
    }
    func main() {
    n := 50
    _ = fib(n)
    }
    MOVD $50, R0
    PCDATA $1, ZR
    CALL "".fib(SB)
    # if n > 1
    CMP $1, R0
    BLE fib_pc92
    # fib(n - 1)
    SUB $1, R0, R1
    MOVD R1, R0
    PCDATA $1, ZR
    CALL "".fib(SB)
    MOVD R0, ""..autotmp_4-8(SP)
    # fib(n - 2)
    MOVD "".n(FP), R1
    SUB $2, R1, R1
    MOVD R1, R0
    CALL "".fib(SB)
    MOVD ""..autotmp_4-8(SP), R1
    # fib(n - 1) + fib(n - 2)
    ADD R0, R1, R0
    MOVD -8(RSP), R29
    MOVD.P 32(RSP), R30
    RET (R30)
    Go v1.18

    View Slide

  18. Performance benchmark

    View Slide

  19. Performance benchmark
    Benchmarks for a representative set of Go packages and programs show
    performance improvements of about 5%, and a typical reduction in binary size
    of about 2%.

    View Slide

  20. Performance benchmark
    A variety of applications can benefit from the 64-bit Arm CPU performance
    improvements released in Go 1.18. Programs with an object-oriented design,
    recursion, or that have many function calls in their implementation will likely
    benefit more from the new register ABI calling convention.
    Making your Go workloads up to 20% faster with Go 1.18 and AWS Graviton

    View Slide

  21. References
    1. Making your Go workloads up to 20% faster with Go 1.18 and AWS Graviton
    2. Proposal: Register-based Go calling convention
    3. Stack frame layout on x86-64

    View Slide

  22. Thank You for Your Time.
    Cherie Hsieh @ TSMC

    View Slide