Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why cgo is slow @ CapitalGo 2018

Why cgo is slow @ CapitalGo 2018

https://capitalgolang.com/program#filippo_valsorda

This talk uses cgo and its below-average performance as an excuse to look into Go internals and what makes Go different from C.

We learn about calling conventions and the code-generated cgo trampolines; about the small goroutines stacks and how C doesn't know how to grow them; about the Go scheduler and how C doesn't yield to it; and about the garbage collector and how pointers in C memory can't be tracked.

Filippo Valsorda

June 22, 2018
Tweet

More Decks by Filippo Valsorda

Other Decks in Programming

Transcript

  1. Why cgo is slow
    Filippo Valsorda

    View Slide

  2. View Slide

  3. cgo is a FFI
    (Foreign Function Interface)

    View Slide

  4. I like FFIs.
    • From cgo back to Go @ GopherCon 2016

    https://speakerdeck.com/filosottile/from-cgo-back-to-go-gophercon-2016
    • rustgo: Building your own FFI @ GothamGo 2017

    https://speakerdeck.com/filosottile/calling-rust-from-go-without-cgo-at-gothamgo-2017
    • Why cgo is slow @ CapitalGo 2018

    Hi!

    View Slide

  5. C function call
    Java FFI
    Rust FFI
    LuaJIT FFI
    Node.js FFI
    cgo
    2.364 ns
    9.01 ns
    2.386 ns
    1.81 ns (!) https://nullprogram.com/blog/2018/05/27/
    18.33 ns
    75.95 ns
    https://github.com/dyu/ffi-overhead

    View Slide

  6. name old time/op new time/op delta
    CgoCall-4 63.1ns ± 3% 57.1ns ± 0% -9.43%

    View Slide

  7. C function call
    Java FFI
    Rust FFI
    LuaJIT FFI
    Node.js FFI
    cgo
    2.364 ns
    9.01 ns
    2.386 ns
    1.81 ns (!) https://nullprogram.com/blog/2018/05/27/
    18.33 ns
    68.77 ns
    https://github.com/dyu/ffi-overhead

    View Slide

  8. C function call
    Java FFI
    Rust FFI
    LuaJIT FFI
    Node.js FFI
    cgo
    2.364 ns
    9.01 ns
    2.386 ns
    1.81 ns (!) https://nullprogram.com/blog/2018/05/27/
    18.33 ns
    68.77 ns (29x)
    https://github.com/dyu/ffi-overhead

    View Slide

  9. cgo:
    • cmd/cgo
    • runtime/cgo
    • a sprinkle of cmd/link/internal/ld support

    View Slide

  10. cgo:
    • cmd/cgo
    • runtime/cgo
    • a sprinkle of cmd/link/internal/ld support
    • not a compiler feature!

    View Slide

  11. cgo:
    • cmd/cgo — a code generator
    • runtime/cgo
    • a sprinkle of cmd/link/internal/ld support
    • not a compiler feature!

    View Slide

  12. Reason 1:
    calling conventions

    View Slide

  13. C compiler
    Go compiler

    View Slide

  14. go build -x -work

    View Slide

  15. View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. src/runtime/cgocall.go

    View Slide

  20. View Slide

  21. View Slide

  22. View Slide

  23. rewritten Go function
    calling convention
    trampoline
    arg unpacking
    real C function
    Go
    ASM
    C

    View Slide

  24. Learn more:
    • src/runtime/cgocall.go
    • rustgo: Building your own FFI @ GothamGo 2017

    https://speakerdeck.com/filosottile/calling-rust-from-go-without-cgo-at-gothamgo-2017

    View Slide

  25. Reason 2:
    small stacks

    View Slide

  26. Initial goroutine stack size: 2048 bytes
    System stack size: 1–8 megabytes

    View Slide

  27. stack

    View Slide

  28. function frame

    View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. View Slide

  34. Function preamble

    View Slide

  35. C doesn't call morestack
    C code needs to run on a system stack
    cgocall / asmcgocall

    View Slide

  36. Learn more:
    • src/runtime/stack.go
    • src/runtime/cgocall.go
    • How stacks are handled in Go by Daniel Morsing

    https://blog.cloudflare.com/how-stacks-are-handled-in-go/

    View Slide

  37. Reason 3:
    the scheduler

    View Slide

  38. From https://morsmachine.dk/go-scheduler

    View Slide

  39. From https://morsmachine.dk/go-scheduler

    View Slide

  40. The Go scheduler is collaborative.
    It can't preempt running code.
    (ProTip: for {} is never what you want. Use select {}.)

    View Slide

  41. View Slide

  42. From https://morsmachine.dk/go-scheduler

    View Slide

  43. Learn more:
    • src/runtime/proc.go → reentersyscall
    • The Go scheduler by Daniel Morsing

    https://morsmachine.dk/go-scheduler
    • Performance without the event loop by Dave Cheney

    https://dave.cheney.net/2015/08/08/performance-without-the-event-loop

    View Slide

  44. Reason 4:
    the garbage collector

    View Slide

  45. Go memory C memory
    []byte

    View Slide

  46. Go memory C memory
    GC
    []byte

    View Slide

  47. Go memory C memory
    GC
    []byte

    View Slide

  48. Go memory C memory
    GC
    []byte
    *uint8_t
    C.some_func()

    View Slide

  49. The cgo rules
    You may pass a Go pointer
    … if it doesn’t point to other pointers
    … and C can’t keep a reference to it

    View Slide

  50. The GC must see all the Go pointers.

    View Slide

  51. panic: runtime error: cgo argument
    has Go pointer to Go pointer
    GODEBUG=cgocheck=2

    View Slide

  52. Learn more:
    • From cgo back to Go @ GopherCon 2016

    https://speakerdeck.com/filosottile/from-cgo-back-to-go-gophercon-2016

    View Slide

  53. Thank you!
    [email protected]
    @FiloSottile
    Olga Shalakhina artwork under CC 3.0 license
    based on Renee French under Creative Commons 3.0 Attributions.

    View Slide