Slide 1

Slide 1 text

Register-based calling convention for Go functions Cherie Hsieh @ TSMC

Slide 2

Slide 2 text

Outline 1. Introduce to calling convention 2. Register-based v.s Stack-based calling convention 3. Switch to a register-based calling convention 4. Performance benchmark

Slide 3

Slide 3 text

Introduce to calling convention Calling convention is a part of Application Binary Interface (ABI), it defines how subroutines receive parameters from their caller and how they return a result. https://en.wikipedia.org/wiki/Calling_convention

Slide 4

Slide 4 text

Introduce to calling convention 0x30 (code address) func main() { price := calcPrice(10, 1) } 0x20 func calcPrice(price int, tax int) int { res := price + tax return res } send parameters return the result 1 2

Slide 5

Slide 5 text

Introduce to calling convention CPU provider Operating System Compiler calling convention guide implement the calling convention extend the calling convention for specific languages

Slide 6

Slide 6 text

Introduce to calling convention RISC-V

Slide 7

Slide 7 text

Register-based v.s Stack-based calling convention

Slide 8

Slide 8 text

Register-based calling convention func add(a int, b int) int { c := a + b return c } func main() { number1 := 2 number2 := 3 result := add(number1, number2) } MOVD $2, R0 MOVD $3, R1 CALL "".add(SB) ADD R1, R0, R0 RET (R30) R: register

Slide 9

Slide 9 text

Stack-based calling convention func add(a int, b int) int { c := a + b return c } func main() { number1 := 2 number2 := 3 result := add(number1, number2) } MOVD $2, R0 MOVD R0, 8(RSP) MOVD $3, R0 MOVD R0, 16(RSP) CALL "".add(SB) MOVD 24(RSP), R0 MOVD "".a(FP), R0 MOVD "".b+8(FP), R1 ADD R1, R0, R0 MOVD R0, "".~r2+16(FP) RET (R30)

Slide 10

Slide 10 text

Calling conventions of different languages Register-based calling conventions 1. C / C++ (GNU or LLVM compiler) 2. Rust (LLVM-based compiler) 3. Java (JIT-compiled) Stack-based calling conventions 1. Python 2. Java (interpreter)

Slide 11

Slide 11 text

Switch to a register-based calling convention

Slide 12

Slide 12 text

Switch to a register-based calling convention Discussion started on Aug/12/2020 (go 1.15) Why Go use a stacked-based calling convention before go 1.17 1. All platforms can use essentially the same conventions 2. Simplify the implementation of loacal variable allocation 3. Simplify the stack tracing for garbage collection and stack growth Drawbacks It leaves a lot of performance on the table.

Slide 13

Slide 13 text

Switch to a register-based calling convention Advantages of stacked-based calling convention accessing arguments in registers is still roughly 40% faster than accessing arguments on the stack (main memory). Drawbacks 1. It would introduce additional compile time to allocate registers. 2. Increasing the design compelxity of compiler

Slide 14

Slide 14 text

Switch to a register-based calling convention Supported Architectures - Golang v1.17 64-bit x86 architecture - Golang v1.18 64-bit ARM and 64-bit PowerPC - Golang v1.19 riscv64

Slide 15

Slide 15 text

Performance benchmark

Slide 16

Slide 16 text

Performance benchmark func fib(n int) int { if n > 1 { return fib(n - 1) + fib(n - 2) } return n } func main() { n := 50 _ = fib(n) } MOVD $50, R0 MOVD R0, 8(RSP) PCDATA $1, ZR CALL "".fib(SB) # if n > 1 MOVD "".n(FP), R0 CMP $1, R0 BLE fib_pc104 # fib(n - 1) SUB $1, R0, R1 MOVD R1, 8(RSP) PCDATA $1, ZR CALL "".fib(SB) MOVD 16(RSP), R0 MOVD R0, ""..autotmp_4-8(SP) # fib(n - 2) MOVD "".n(FP), R1 SUB $2, R1, R1 MOVD R1, 8(RSP) CALL "".fib(SB) MOVD 16(RSP), R0 MOVD ""..autotmp_4-8(SP), R1 # fib(n - 1) + fib(n - 2) ADD R0, R1, R0 MOVD R0, "".~r1+8(FP) MOVD -8(RSP), R29 MOVD.P 48(RSP), R30 RET (R30) Go v1.17

Slide 17

Slide 17 text

Performance benchmark func fib(n int) int { if n > 1 { return fib(n - 1) + fib(n - 2) } return n } func main() { n := 50 _ = fib(n) } MOVD $50, R0 PCDATA $1, ZR CALL "".fib(SB) # if n > 1 CMP $1, R0 BLE fib_pc92 # fib(n - 1) SUB $1, R0, R1 MOVD R1, R0 PCDATA $1, ZR CALL "".fib(SB) MOVD R0, ""..autotmp_4-8(SP) # fib(n - 2) MOVD "".n(FP), R1 SUB $2, R1, R1 MOVD R1, R0 CALL "".fib(SB) MOVD ""..autotmp_4-8(SP), R1 # fib(n - 1) + fib(n - 2) ADD R0, R1, R0 MOVD -8(RSP), R29 MOVD.P 32(RSP), R30 RET (R30) Go v1.18

Slide 18

Slide 18 text

Performance benchmark

Slide 19

Slide 19 text

Performance benchmark Benchmarks for a representative set of Go packages and programs show performance improvements of about 5%, and a typical reduction in binary size of about 2%.

Slide 20

Slide 20 text

Performance benchmark A variety of applications can benefit from the 64-bit Arm CPU performance improvements released in Go 1.18. Programs with an object-oriented design, recursion, or that have many function calls in their implementation will likely benefit more from the new register ABI calling convention. Making your Go workloads up to 20% faster with Go 1.18 and AWS Graviton

Slide 21

Slide 21 text

References 1. Making your Go workloads up to 20% faster with Go 1.18 and AWS Graviton 2. Proposal: Register-based Go calling convention 3. Stack frame layout on x86-64

Slide 22

Slide 22 text

Thank You for Your Time. Cherie Hsieh @ TSMC