Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Adventures in Cgo Preformance

Adventures in Cgo Preformance

Cgo is a powerful tool in the Go programmer’s arsenal. It allows Go programmers to interoperate with other languages. However, Cgo documentation is scarce and best practices for performance are hard to come by. In this tutorial session, I discuss lessons I've has learned working on the Go API for Wallaroo, a high-performance distributed stream processor written in Pony.

I cover hard-won knowledge about using Cgo in performance sensitive code including: ways in which Cgo makes interoperation with other languages difficult, how you can work around common sources of performance and scaling problems, and an issue with the Go runtime that can't be worked around.

Sean T Allen

August 29, 2018
Tweet

More Decks by Sean T Allen

Other Decks in Technology

Transcript

  1. ADVENTURES IN CGO PERFORMANCE

    View Slide

  2. SEAN T. ALLEN
    VP OF ENGINEERING AT WALLAROO LABS
    AUTHOR OF “STORM APPLIED”
    MEMBER OF THE PONY CORE TEAM
    LOVER OF ARTISANAL STREET ART
    @SEANTALLEN
    @WALLAROOLABS
    @PONYLANG

    View Slide

  3. WHAT’S IN THIS TALK…

    View Slide

  4. CGO AND ME
    GLUING LANGUAGES
    TOGETHER FOR FUN AND
    PROFIT

    View Slide

  5. I’M A C PROGRAMMER

    View Slide

  6. I’M A PONY PROGRAMMER

    View Slide

  7. YOU MIGHT EVEN CALL ME
    A CGO PROGRAMMER

    View Slide

  8. YOU PROBABLY WOULDN’T CALL ME
    A GO PROGRAMMER

    View Slide

  9. I ENDED UP HERE BECAUSE
    WALLAROO LABS NEEDED TO
    CALL GO CODE FROM PONY CODE

    View Slide

  10. WALLAROO
    SCALE-INDEPENDENT COMPUTING
    FOR GO (AND PYTHON)

    View Slide

  11. FRAMEWORK FOR DOING
    “BIG DATA STUFF”
    * think Hadoop, Spark, Storm, Flink, Kafka Streams

    View Slide

  12. FRAMEWORK FOR HORIZONTALLY SCALING
    EVENT-BY-EVENT STREAM PROCESSING
    APPLICATIONS
    * think Storm, Flink, Kafka Streams

    View Slide

  13. TWO-LAYER ARCHITECTURE

    View Slide

  14. scale-independent
    scale-aware
    API

    View Slide

  15. scale-independent
    scale-aware
    API

    View Slide

  16. scale-independent
    scale-aware
    API

    View Slide

  17. scale-independent
    scale-aware
    API

    View Slide

  18. scale-independent
    scale-aware
    API

    View Slide

  19. scale-independent
    scale-aware
    API

    View Slide

  20. WALLAROO & GO
    A MARRIAGE MADE IN CGO

    View Slide

  21. WALLAROO
    Decode Compute Encode
    WALLAROO RUNTIME

    View Slide

  22. WALLAROO: PONY RUNTIME
    Decode Compute Encode
    WALLAROO RUNTIME

    View Slide

  23. WALLAROO: GO COMPUTATIONS
    Decode Compute Encode
    WALLAROO RUNTIME

    View Slide

  24. WALLAROO: CGO BRIDGE
    Decode Compute Encode
    WALLAROO RUNTIME

    View Slide

  25. TWO-LAYER ARCHITECTURE REVISITED
    scale-independent
    scale-aware
    API

    View Slide

  26. SCALE-AWARE PONY RUNTIME
    scale-independent
    scale-aware
    API

    View Slide

  27. SCALE-AWARE PONY RUNTIME
    scale-independent
    Pony
    API

    View Slide

  28. USER SUPPLIED SCALE-INDEPENDENT GO
    scale-independent
    Pony
    API

    View Slide

  29. USER SUPPLIED SCALE-INDEPENDENT GO
    Go
    Pony
    API

    View Slide

  30. CGO BRIDGE BETWEEN GO AND PONY
    Go
    Pony
    API

    View Slide

  31. CGO BRIDGE BETWEEN GO AND PONY
    Go
    Pony
    cgo

    View Slide

  32. AND SO ENDS
    THE SEAN T. ALLEN CGO BACKSTORY

    View Slide

  33. CGO
    WE AREN’T IN KANSAS
    ANYMORE

    View Slide

  34. CALL “C” FROM GO

    View Slide

  35. CALL GO FROM “C”

    View Slide

  36. TO MY EYE,
    CGO IS NOT FFI*
    * except it is according to Wikipedia

    View Slide

  37. AND IT’S NOT GO

    View Slide

  38. CGO
    PERFORMANCE
    IT’S COMPLICATED

    View Slide

  39. CGO
    PERFORMANCE
    CALLING “C” FROM GO

    View Slide

  40. CALLING “C” FROM GO
    METHOD OPERATIONS COMPLETED NANOS PER OPERATION
    CGO 10,000,000 171 NS/OP
    GO 2,000,000,000 1.83 NS/OP
    * according to a simple Cockroach Labs benchmark

    View Slide

  41. CGO
    PERFORMANCE
    CALLING GO FROM “C”

    View Slide

  42. CALLING GO FROM C
    TEST MACHINE MILLISECONDS PER OPERATION
    AWS (various instance types) 1-2 MS/OP
    2014 MacBook Pro 5-6 MS/OP
    * according to a simple Sean T. Allen benchmark

    View Slide

  43. RUNTIME/PROC.GO
    LINE 1771

    View Slide

  44. RECOMMENDATION:
    BATCH YOUR CGO CALLS

    View Slide

  45. GO => “C”
    DO AS MUCH AS YOU CAN IN A SINGLE “C” CALL

    View Slide

  46. “C” => GO
    DO AS MUCH AS YOU CAN IN A SINGLE GO CALL

    View Slide

  47. THE PROBLEM
    WITH POINTERS

    View Slide

  48. BUT FIRST, LET’S TALK ABOUT
    GARBAGE COLLECTION

    View Slide

  49. THERE ARE MORE TYPES OF GARBAGE COLLECTION
    IN HEAVEN AND EARTH THAN ARE DREAMT OF
    IN YOUR PHILOSOPHY

    View Slide

  50. “COPYING” GARBAGE COLLECTORS
    WILL MOVE OBJECTS
    IN MEMORY

    View Slide

  51. RELOCATING OBJECTS IN MEMORY
    ADDS COMPLEXITY
    TO FFI

    View Slide

  52. C ISN’T ALLOWED TO HOLD ONTO
    GO POINTERS

    View Slide

  53. C ISN’T ALLOWED TO HOLD ONTO
    GO POINTERS
    AND IT’S CHECKED AT RUNTIME

    View Slide

  54. AND BAD THINGS HAPPEN IF YOU DON’T
    FOLLOW THE RULES
    • Go code may pass a Go pointer to C
    provided the Go memory to which it
    points does not contain any Go
    pointers
    • C code may not keep a copy of a Go
    pointer after the call returns
    • A Go function called by C code may
    not return a Go pointer
    • Go code may not store a Go pointer in
    C memory.

    View Slide

  55. BUT…
    WHAT IF I REALLY NEED TO?
    * like Wallaroo for example

    View Slide

  56. “BIG OLD MAP”
    A SOLUTION OF SORTS

    View Slide

  57. GO IS ALLOWED TO RETURN
    NON-POINTERS TO C
    * like a unit64

    View Slide

  58. A “BIG OLD MAP” OF
    INTEGERS TO GO OBJECTS
    SOLVES OUR POINTER PROBLEM

    View Slide

  59. View Slide

  60. View Slide

  61. View Slide

  62. View Slide

  63. BONUS PROBLEM SOLVED…

    View Slide

  64. HOLDING OBJECTS IN THE BIG OLD MAP
    KEEPS THEM FROM BEING GARBAGE COLLECTED

    View Slide

  65. “BIG OLD MAP”
    THERE ARE PROBLEMS

    View Slide

  66. PERFORMANCE WILL SUFFER
    UNDER CONTENTION

    View Slide

  67. CONTENDED LOCKS
    DESTROY PERFORMANCE

    View Slide

  68. “BIG OLD MAP” WON’T TAKE YOU VERY FAR

    View Slide

  69. TIME FOR A LITTLE SHARDING

    View Slide

  70. CONCURRENT MAP
    FROM 1 LOCK TO MANY
    LOCKS

    View Slide

  71. View Slide

  72. View Slide

  73. View Slide

  74. View Slide

  75. View Slide

  76. ID TO SHARD WITH 8 SHARDS
    ID SHARD
    0 0
    1 1
    8 0
    12 4

    View Slide

  77. ID GENERATION

    View Slide

  78. View Slide

  79. View Slide

  80. CONCURRENT MAP
    THERE ARE PROBLEMS

    View Slide

  81. THERE’S THAT LOCK IN OUR ID GENERATOR

    View Slide

  82. CAN WE DITCH THAT ID GENERATION LOCK?

    View Slide

  83. ATOMICS
    A MORE CONCURRENCY
    FRIENDLY ALTERNATIVE

    View Slide

  84. View Slide

  85. RECOMMENDATIONS

    View Slide

  86. FOR ID GENERATION:
    USE THE ATOMICS PACKAGE
    OR
    SOMETHING INTRINSIC TO THE VALUE

    View Slide

  87. PICK YOUR “MAP” CAREFULLY..
    CONCURRENT MAP
    PROBABLY ISN’T RIGHT FOR YOU

    View Slide

  88. CONSIDER PERFORMANCE UPFRONT

    View Slide

  89. AVOID LOCKS

    View Slide

  90. CGO AND YOU
    HOW YOU CAN HELP
    IMPROVE CGO

    View Slide

  91. DOCUMENTATION IS NEEDED

    View Slide

  92. IF YOU ARE USING CGO,
    TALK ABOUT THE PAIN
    * and the value

    View Slide

  93. WORK TO MAKE THE CGO EXPERIENCE
    MORE LIKE THE GO EXPERIENCE

    View Slide

  94. WORK ON THAT TODO
    ON LINE 1771
    IN RUNTIME/PROC.GO
    * or otherwise contribute code

    View Slide

  95. THANKS
    BRIAN KETELSEN
    ANDREW TURLEY

    View Slide

  96. SPECIAL THANKS TO
    JEFF WENDLING
    AKA @ZEEBO ON THE GOPHER SLACK

    View Slide

  97. LEARN MORE
    GITHUB.COM/SEANTALLEN/
    ADVENTURES-IN-CGO-
    PERFORMANCE

    View Slide