Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Adventures in Cgo Preformance

Adventures in Cgo Preformance

Cgo is a powerful tool in the Go programmer’s arsenal. It allows Go programmers to interoperate with other languages. However, Cgo documentation is scarce and best practices for performance are hard to come by. In this tutorial session, I discuss lessons I've has learned working on the Go API for Wallaroo, a high-performance distributed stream processor written in Pony.

I cover hard-won knowledge about using Cgo in performance sensitive code including: ways in which Cgo makes interoperation with other languages difficult, how you can work around common sources of performance and scaling problems, and an issue with the Go runtime that can't be worked around.

Sean T Allen

August 29, 2018
Tweet

More Decks by Sean T Allen

Other Decks in Technology

Transcript

  1. ADVENTURES IN CGO PERFORMANCE

    View full-size slide

  2. SEAN T. ALLEN
    VP OF ENGINEERING AT WALLAROO LABS
    AUTHOR OF “STORM APPLIED”
    MEMBER OF THE PONY CORE TEAM
    LOVER OF ARTISANAL STREET ART
    @SEANTALLEN
    @WALLAROOLABS
    @PONYLANG

    View full-size slide

  3. WHAT’S IN THIS TALK…

    View full-size slide

  4. CGO AND ME
    GLUING LANGUAGES
    TOGETHER FOR FUN AND
    PROFIT

    View full-size slide

  5. I’M A C PROGRAMMER

    View full-size slide

  6. I’M A PONY PROGRAMMER

    View full-size slide

  7. YOU MIGHT EVEN CALL ME
    A CGO PROGRAMMER

    View full-size slide

  8. YOU PROBABLY WOULDN’T CALL ME
    A GO PROGRAMMER

    View full-size slide

  9. I ENDED UP HERE BECAUSE
    WALLAROO LABS NEEDED TO
    CALL GO CODE FROM PONY CODE

    View full-size slide

  10. WALLAROO
    SCALE-INDEPENDENT COMPUTING
    FOR GO (AND PYTHON)

    View full-size slide

  11. FRAMEWORK FOR DOING
    “BIG DATA STUFF”
    * think Hadoop, Spark, Storm, Flink, Kafka Streams

    View full-size slide

  12. FRAMEWORK FOR HORIZONTALLY SCALING
    EVENT-BY-EVENT STREAM PROCESSING
    APPLICATIONS
    * think Storm, Flink, Kafka Streams

    View full-size slide

  13. TWO-LAYER ARCHITECTURE

    View full-size slide

  14. scale-independent
    scale-aware
    API

    View full-size slide

  15. scale-independent
    scale-aware
    API

    View full-size slide

  16. scale-independent
    scale-aware
    API

    View full-size slide

  17. scale-independent
    scale-aware
    API

    View full-size slide

  18. scale-independent
    scale-aware
    API

    View full-size slide

  19. scale-independent
    scale-aware
    API

    View full-size slide

  20. WALLAROO & GO
    A MARRIAGE MADE IN CGO

    View full-size slide

  21. WALLAROO
    Decode Compute Encode
    WALLAROO RUNTIME

    View full-size slide

  22. WALLAROO: PONY RUNTIME
    Decode Compute Encode
    WALLAROO RUNTIME

    View full-size slide

  23. WALLAROO: GO COMPUTATIONS
    Decode Compute Encode
    WALLAROO RUNTIME

    View full-size slide

  24. WALLAROO: CGO BRIDGE
    Decode Compute Encode
    WALLAROO RUNTIME

    View full-size slide

  25. TWO-LAYER ARCHITECTURE REVISITED
    scale-independent
    scale-aware
    API

    View full-size slide

  26. SCALE-AWARE PONY RUNTIME
    scale-independent
    scale-aware
    API

    View full-size slide

  27. SCALE-AWARE PONY RUNTIME
    scale-independent
    Pony
    API

    View full-size slide

  28. USER SUPPLIED SCALE-INDEPENDENT GO
    scale-independent
    Pony
    API

    View full-size slide

  29. USER SUPPLIED SCALE-INDEPENDENT GO
    Go
    Pony
    API

    View full-size slide

  30. CGO BRIDGE BETWEEN GO AND PONY
    Go
    Pony
    API

    View full-size slide

  31. CGO BRIDGE BETWEEN GO AND PONY
    Go
    Pony
    cgo

    View full-size slide

  32. AND SO ENDS
    THE SEAN T. ALLEN CGO BACKSTORY

    View full-size slide

  33. CGO
    WE AREN’T IN KANSAS
    ANYMORE

    View full-size slide

  34. CALL “C” FROM GO

    View full-size slide

  35. CALL GO FROM “C”

    View full-size slide

  36. TO MY EYE,
    CGO IS NOT FFI*
    * except it is according to Wikipedia

    View full-size slide

  37. AND IT’S NOT GO

    View full-size slide

  38. CGO
    PERFORMANCE
    IT’S COMPLICATED

    View full-size slide

  39. CGO
    PERFORMANCE
    CALLING “C” FROM GO

    View full-size slide

  40. CALLING “C” FROM GO
    METHOD OPERATIONS COMPLETED NANOS PER OPERATION
    CGO 10,000,000 171 NS/OP
    GO 2,000,000,000 1.83 NS/OP
    * according to a simple Cockroach Labs benchmark

    View full-size slide

  41. CGO
    PERFORMANCE
    CALLING GO FROM “C”

    View full-size slide

  42. CALLING GO FROM C
    TEST MACHINE MILLISECONDS PER OPERATION
    AWS (various instance types) 1-2 MS/OP
    2014 MacBook Pro 5-6 MS/OP
    * according to a simple Sean T. Allen benchmark

    View full-size slide

  43. RUNTIME/PROC.GO
    LINE 1771

    View full-size slide

  44. RECOMMENDATION:
    BATCH YOUR CGO CALLS

    View full-size slide

  45. GO => “C”
    DO AS MUCH AS YOU CAN IN A SINGLE “C” CALL

    View full-size slide

  46. “C” => GO
    DO AS MUCH AS YOU CAN IN A SINGLE GO CALL

    View full-size slide

  47. THE PROBLEM
    WITH POINTERS

    View full-size slide

  48. BUT FIRST, LET’S TALK ABOUT
    GARBAGE COLLECTION

    View full-size slide

  49. THERE ARE MORE TYPES OF GARBAGE COLLECTION
    IN HEAVEN AND EARTH THAN ARE DREAMT OF
    IN YOUR PHILOSOPHY

    View full-size slide

  50. “COPYING” GARBAGE COLLECTORS
    WILL MOVE OBJECTS
    IN MEMORY

    View full-size slide

  51. RELOCATING OBJECTS IN MEMORY
    ADDS COMPLEXITY
    TO FFI

    View full-size slide

  52. C ISN’T ALLOWED TO HOLD ONTO
    GO POINTERS

    View full-size slide

  53. C ISN’T ALLOWED TO HOLD ONTO
    GO POINTERS
    AND IT’S CHECKED AT RUNTIME

    View full-size slide

  54. AND BAD THINGS HAPPEN IF YOU DON’T
    FOLLOW THE RULES
    • Go code may pass a Go pointer to C
    provided the Go memory to which it
    points does not contain any Go
    pointers
    • C code may not keep a copy of a Go
    pointer after the call returns
    • A Go function called by C code may
    not return a Go pointer
    • Go code may not store a Go pointer in
    C memory.

    View full-size slide

  55. BUT…
    WHAT IF I REALLY NEED TO?
    * like Wallaroo for example

    View full-size slide

  56. “BIG OLD MAP”
    A SOLUTION OF SORTS

    View full-size slide

  57. GO IS ALLOWED TO RETURN
    NON-POINTERS TO C
    * like a unit64

    View full-size slide

  58. A “BIG OLD MAP” OF
    INTEGERS TO GO OBJECTS
    SOLVES OUR POINTER PROBLEM

    View full-size slide

  59. BONUS PROBLEM SOLVED…

    View full-size slide

  60. HOLDING OBJECTS IN THE BIG OLD MAP
    KEEPS THEM FROM BEING GARBAGE COLLECTED

    View full-size slide

  61. “BIG OLD MAP”
    THERE ARE PROBLEMS

    View full-size slide

  62. PERFORMANCE WILL SUFFER
    UNDER CONTENTION

    View full-size slide

  63. CONTENDED LOCKS
    DESTROY PERFORMANCE

    View full-size slide

  64. “BIG OLD MAP” WON’T TAKE YOU VERY FAR

    View full-size slide

  65. TIME FOR A LITTLE SHARDING

    View full-size slide

  66. CONCURRENT MAP
    FROM 1 LOCK TO MANY
    LOCKS

    View full-size slide

  67. ID TO SHARD WITH 8 SHARDS
    ID SHARD
    0 0
    1 1
    8 0
    12 4

    View full-size slide

  68. ID GENERATION

    View full-size slide

  69. CONCURRENT MAP
    THERE ARE PROBLEMS

    View full-size slide

  70. THERE’S THAT LOCK IN OUR ID GENERATOR

    View full-size slide

  71. CAN WE DITCH THAT ID GENERATION LOCK?

    View full-size slide

  72. ATOMICS
    A MORE CONCURRENCY
    FRIENDLY ALTERNATIVE

    View full-size slide

  73. RECOMMENDATIONS

    View full-size slide

  74. FOR ID GENERATION:
    USE THE ATOMICS PACKAGE
    OR
    SOMETHING INTRINSIC TO THE VALUE

    View full-size slide

  75. PICK YOUR “MAP” CAREFULLY..
    CONCURRENT MAP
    PROBABLY ISN’T RIGHT FOR YOU

    View full-size slide

  76. CONSIDER PERFORMANCE UPFRONT

    View full-size slide

  77. CGO AND YOU
    HOW YOU CAN HELP
    IMPROVE CGO

    View full-size slide

  78. DOCUMENTATION IS NEEDED

    View full-size slide

  79. IF YOU ARE USING CGO,
    TALK ABOUT THE PAIN
    * and the value

    View full-size slide

  80. WORK TO MAKE THE CGO EXPERIENCE
    MORE LIKE THE GO EXPERIENCE

    View full-size slide

  81. WORK ON THAT TODO
    ON LINE 1771
    IN RUNTIME/PROC.GO
    * or otherwise contribute code

    View full-size slide

  82. THANKS
    BRIAN KETELSEN
    ANDREW TURLEY

    View full-size slide

  83. SPECIAL THANKS TO
    JEFF WENDLING
    AKA @ZEEBO ON THE GOPHER SLACK

    View full-size slide

  84. LEARN MORE
    GITHUB.COM/SEANTALLEN/
    ADVENTURES-IN-CGO-
    PERFORMANCE

    View full-size slide