Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why cgo is slow @ CapitalGo 2018

Why cgo is slow @ CapitalGo 2018

https://capitalgolang.com/program#filippo_valsorda

This talk uses cgo and its below-average performance as an excuse to look into Go internals and what makes Go different from C.

We learn about calling conventions and the code-generated cgo trampolines; about the small goroutines stacks and how C doesn't know how to grow them; about the Go scheduler and how C doesn't yield to it; and about the garbage collector and how pointers in C memory can't be tracked.

Filippo Valsorda

June 22, 2018
Tweet

More Decks by Filippo Valsorda

Other Decks in Programming

Transcript

  1. Why cgo is slow
    Filippo Valsorda

    View full-size slide

  2. cgo is a FFI
    (Foreign Function Interface)

    View full-size slide

  3. I like FFIs.
    • From cgo back to Go @ GopherCon 2016

    https://speakerdeck.com/filosottile/from-cgo-back-to-go-gophercon-2016
    • rustgo: Building your own FFI @ GothamGo 2017

    https://speakerdeck.com/filosottile/calling-rust-from-go-without-cgo-at-gothamgo-2017
    • Why cgo is slow @ CapitalGo 2018

    Hi!

    View full-size slide

  4. C function call
    Java FFI
    Rust FFI
    LuaJIT FFI
    Node.js FFI
    cgo
    2.364 ns
    9.01 ns
    2.386 ns
    1.81 ns (!) https://nullprogram.com/blog/2018/05/27/
    18.33 ns
    75.95 ns
    https://github.com/dyu/ffi-overhead

    View full-size slide

  5. name old time/op new time/op delta
    CgoCall-4 63.1ns ± 3% 57.1ns ± 0% -9.43%

    View full-size slide

  6. C function call
    Java FFI
    Rust FFI
    LuaJIT FFI
    Node.js FFI
    cgo
    2.364 ns
    9.01 ns
    2.386 ns
    1.81 ns (!) https://nullprogram.com/blog/2018/05/27/
    18.33 ns
    68.77 ns
    https://github.com/dyu/ffi-overhead

    View full-size slide

  7. C function call
    Java FFI
    Rust FFI
    LuaJIT FFI
    Node.js FFI
    cgo
    2.364 ns
    9.01 ns
    2.386 ns
    1.81 ns (!) https://nullprogram.com/blog/2018/05/27/
    18.33 ns
    68.77 ns (29x)
    https://github.com/dyu/ffi-overhead

    View full-size slide

  8. cgo:
    • cmd/cgo
    • runtime/cgo
    • a sprinkle of cmd/link/internal/ld support

    View full-size slide

  9. cgo:
    • cmd/cgo
    • runtime/cgo
    • a sprinkle of cmd/link/internal/ld support
    • not a compiler feature!

    View full-size slide

  10. cgo:
    • cmd/cgo — a code generator
    • runtime/cgo
    • a sprinkle of cmd/link/internal/ld support
    • not a compiler feature!

    View full-size slide

  11. Reason 1:
    calling conventions

    View full-size slide

  12. C compiler
    Go compiler

    View full-size slide

  13. go build -x -work

    View full-size slide

  14. src/runtime/cgocall.go

    View full-size slide

  15. rewritten Go function
    calling convention
    trampoline
    arg unpacking
    real C function
    Go
    ASM
    C

    View full-size slide

  16. Learn more:
    • src/runtime/cgocall.go
    • rustgo: Building your own FFI @ GothamGo 2017

    https://speakerdeck.com/filosottile/calling-rust-from-go-without-cgo-at-gothamgo-2017

    View full-size slide

  17. Reason 2:
    small stacks

    View full-size slide

  18. Initial goroutine stack size: 2048 bytes
    System stack size: 1–8 megabytes

    View full-size slide

  19. function frame

    View full-size slide

  20. Function preamble

    View full-size slide

  21. C doesn't call morestack
    C code needs to run on a system stack
    cgocall / asmcgocall

    View full-size slide

  22. Learn more:
    • src/runtime/stack.go
    • src/runtime/cgocall.go
    • How stacks are handled in Go by Daniel Morsing

    https://blog.cloudflare.com/how-stacks-are-handled-in-go/

    View full-size slide

  23. Reason 3:
    the scheduler

    View full-size slide

  24. From https://morsmachine.dk/go-scheduler

    View full-size slide

  25. From https://morsmachine.dk/go-scheduler

    View full-size slide

  26. The Go scheduler is collaborative.
    It can't preempt running code.
    (ProTip: for {} is never what you want. Use select {}.)

    View full-size slide

  27. From https://morsmachine.dk/go-scheduler

    View full-size slide

  28. Learn more:
    • src/runtime/proc.go → reentersyscall
    • The Go scheduler by Daniel Morsing

    https://morsmachine.dk/go-scheduler
    • Performance without the event loop by Dave Cheney

    https://dave.cheney.net/2015/08/08/performance-without-the-event-loop

    View full-size slide

  29. Reason 4:
    the garbage collector

    View full-size slide

  30. Go memory C memory
    []byte

    View full-size slide

  31. Go memory C memory
    GC
    []byte

    View full-size slide

  32. Go memory C memory
    GC
    []byte

    View full-size slide

  33. Go memory C memory
    GC
    []byte
    *uint8_t
    C.some_func()

    View full-size slide

  34. The cgo rules
    You may pass a Go pointer
    … if it doesn’t point to other pointers
    … and C can’t keep a reference to it

    View full-size slide

  35. The GC must see all the Go pointers.

    View full-size slide

  36. panic: runtime error: cgo argument
    has Go pointer to Go pointer
    GODEBUG=cgocheck=2

    View full-size slide

  37. Learn more:
    • From cgo back to Go @ GopherCon 2016

    https://speakerdeck.com/filosottile/from-cgo-back-to-go-gophercon-2016

    View full-size slide

  38. Thank you!
    [email protected]
    @FiloSottile
    Olga Shalakhina artwork under CC 3.0 license
    based on Renee French under Creative Commons 3.0 Attributions.

    View full-size slide