Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to port your Python software to Go without people noticing

How to port your Python software to Go without people noticing

Success stories about rewriting Python applications in Go are not big news anymore. The pros and cons are well known, best practices are in place, the standard library is there to help. But what if there’s some Python code you would like to keep or worse, some you can’t get rid of? When we chose to port the Datadog Agent to Go, we had a requirement to maintain our plugin system in Python. During the talk we will share stories about cgo, the GIL and the quest for performance as we bridge multiple languages in a single application.

Massimiliano Pippi

January 23, 2017
Tweet

More Decks by Massimiliano Pippi

Other Decks in Programming

Transcript

  1. How to port your Python software to Go without
    people noticing - a real story.
    Massimiliano Pippi

    View Slide

  2. ● Software Engineer, Generalist
    ○ C++, Python and Go
    ● OSS fan and contributor
    ● 1 year at Datadog, working on Agent and Integrations
    Hi, I’m Massi!

    View Slide

  3. ● SaaS based infrastructure and app monitoring
    ● Open Source Agent
    ● Time series data (metrics and events)
    ● Processing nearly a trillion data points per day
    ● Intelligent Alerting and Insightful Dashboards
    About Datadog

    View Slide

  4. Monitor everything

    View Slide

  5. Meet the Datadog Agent
    Agent
    check
    ● Written in Python
    ● Open Source,
    https://github.com/DataDog/dd-agent

    View Slide

  6. The anatomy of a check
    import psutil
    from checks import AgentCheck
    class SystemSwap(AgentCheck):
    def check(self, instance):
    swap_mem = psutil.swap_memory()
    self.rate('system.swap.swapped_in', swap_mem.sin)
    self.rate('system.swap.swapped_out', swap_mem.sout)
    self.gauge(‘system.swap.total’, swap_mem.total)
    self.gauge(‘system.swap.used’, swap_mem.used)

    View Slide

  7. The way to Go - Our Goals
    ● Keep Python as an extension language
    ○ ~75 checks part of the official package
    ○ Undetermined number of custom checks in the wild
    ● RPC between the core and a self-standing
    Python process is not desirable
    ● Stop serializing all checks in time

    View Slide

  8. Embedding for the win
    Cgo enables the creation of Go packages that call C code.
    ● CPython happens to be C code
    ● Embedding is a well documented in C: you
    keep an interpreter in memory and make it run
    Python code at will

    View Slide

  9. Embedding: an example
    // #cgo pkg-config: python-2.7
    // #include
    import "C"
    C.Py_Initialize()
    cmd := C.CString(“print ‘Hello, World!’”) // cmd must be freed!
    C.PyRun_SimpleString(cmd)
    C.Py_Finalize()
    ● cgo handles the build toolchain pretty nice, we
    still do `go build` and that’s it.

    View Slide

  10. Embedding: a better example
    ● Use go-python to eliminate boilerplate
    ○ https://github.com/sbinet/go-python
    import "github.com/sbinet/go-python"
    python.Initialize()
    python.PyRun_SimpleString(“print ‘Hello, World!’”)
    python.Finalize()

    View Slide

  11. Demo time!
    Let’s run a Python module from a go application
    ● https://github.com/masci/golab17/tree/master/01

    View Slide

  12. The dreadful GIL
    ● Embedding CPython means embedding the GIL
    ● Threads in the Python world are ok…
    ● ...but we can’t invoke Python code in parallel!
    ● Rule of thumb: any time you use Python in
    some piece of code that could be executed in a
    separate thread, lock the GIL!

    View Slide

  13. Demo time!
    Run Python in different goroutines and watch the
    world burn
    ● https://github.com/masci/golab17/tree/master/02

    View Slide

  14. The dreadful GIL 2
    ● We lock and unlock the GIL in goroutines, not
    threads
    ● The GIL protects a global thread state, we
    cannot lock/unlock it from different threads!
    ● But the Go scheduler might pause and resume
    goroutines in different threads so goroutines
    must be locked to one OS thread :(

    View Slide

  15. Demo time!
    See what happens when the Go scheduler
    relocates our Pythonic goroutines (spoiler alert:
    it’ll crash your software)
    ● https://github.com/masci/golab17/tree/master/03

    View Slide

  16. Beyond embedding: extending!
    ● Once you have an embedded interpreter, you
    can extend Python capabilities with Go code
    ● This involves a little bit of C so no demo here
    ● Still very easy to achieve, Python scripts
    import a module that actually lives in memory
    and points to Go instructions

    View Slide

  17. Extending Python: the Go code
    //export MyGoFunc
    func MyGoFunc() {
    fmt.Println(“Hello, World!”)
    }

    View Slide

  18. Extending Python: the C code
    static PyMethodDef MyMethods[] = {
    {"my_func", (PyCFunction)my_go_func, METH_VARARGS, "YAY!"},
    {NULL, NULL} // guards
    };
    PyObject *m = Py_InitModule("my_module", MyMethods);

    View Slide

  19. Extending Python: the Python code
    # WARNING! This only works on the embedded interpreter
    import my_module
    my_module.my_func() # prints “Hello, World!”

    View Slide

  20. Lessons learned: the good
    ● Embedded Python plays nice with Go
    concurrency model
    ● From/To Python overhead is negligible
    ○ BenchmarkCallPyFunc 300000 3606 ns/op
    ● Extending Python is a very powerful tool
    ○ Expose low level functions, configuration
    management, etc to the Python world

    View Slide

  21. Lessons learned: the bad
    ● The GIL prevents Python parallel execution
    ○ This was expected
    ● The GIL also interferes with the Go scheduler
    ○ Honestly, didn’t see this coming
    ● Using multiple interpreters doesn’t help
    ○ They share a unique GIL

    View Slide

  22. Lessons learned: the ugly
    ● You must carry on some C code, how much
    depending on the use case
    ● You will likely carry on some Python code too,
    to offer base classes and utilities to external
    modules running in embedded mode

    View Slide

  23. What we have now
    ● Embedded CPython 2.7.12
    ● Linux and OSX, Windows on its way
    ● We now run checks concurrently, knowing that many
    of them will wait for each other...
    ● ...even if some of them were ported to Go, so we also
    have some parallelism
    ● At the end of the day, we got a good deal

    View Slide

  24. Thanks for listening!
    ● Try Datadog at https://www.datadoghq.com
    ● Find our OSS on https://github.com/DataDog
    ● Our tech blog is http://engineering.datadoghq.com
    The new Agent will be open sourced soon, stay tuned!

    View Slide

  25. WE’RE HIRING!
    NYC, PARIS, Remote

    View Slide