Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to port your Python software to Go without people noticing

How to port your Python software to Go without people noticing

Success stories about rewriting Python applications in Go are not big news anymore. The pros and cons are well known, best practices are in place, the standard library is there to help. But what if there’s some Python code you would like to keep or worse, some you can’t get rid of? When we chose to port the Datadog Agent to Go, we had a requirement to maintain our plugin system in Python. During the talk we will share stories about cgo, the GIL and the quest for performance as we bridge multiple languages in a single application.

Aee59cb61d6561244163f490aec50939?s=128

Massimiliano Pippi

January 23, 2017
Tweet

Transcript

  1. How to port your Python software to Go without people

    noticing - a real story. Massimiliano Pippi
  2. • Software Engineer, Generalist ◦ C++, Python and Go •

    OSS fan and contributor • 1 year at Datadog, working on Agent and Integrations Hi, I’m Massi!
  3. • SaaS based infrastructure and app monitoring • Open Source

    Agent • Time series data (metrics and events) • Processing nearly a trillion data points per day • Intelligent Alerting and Insightful Dashboards About Datadog
  4. Monitor everything

  5. Meet the Datadog Agent Agent check • Written in Python

    • Open Source, https://github.com/DataDog/dd-agent
  6. The anatomy of a check import psutil from checks import

    AgentCheck class SystemSwap(AgentCheck): def check(self, instance): swap_mem = psutil.swap_memory() self.rate('system.swap.swapped_in', swap_mem.sin) self.rate('system.swap.swapped_out', swap_mem.sout) self.gauge(‘system.swap.total’, swap_mem.total) self.gauge(‘system.swap.used’, swap_mem.used)
  7. The way to Go - Our Goals • Keep Python

    as an extension language ◦ ~75 checks part of the official package ◦ Undetermined number of custom checks in the wild • RPC between the core and a self-standing Python process is not desirable • Stop serializing all checks in time
  8. Embedding for the win Cgo enables the creation of Go

    packages that call C code. • CPython happens to be C code • Embedding is a well documented in C: you keep an interpreter in memory and make it run Python code at will
  9. Embedding: an example // #cgo pkg-config: python-2.7 // #include <Python.h>

    import "C" C.Py_Initialize() cmd := C.CString(“print ‘Hello, World!’”) // cmd must be freed! C.PyRun_SimpleString(cmd) C.Py_Finalize() • cgo handles the build toolchain pretty nice, we still do `go build` and that’s it.
  10. Embedding: a better example • Use go-python to eliminate boilerplate

    ◦ https://github.com/sbinet/go-python import "github.com/sbinet/go-python" python.Initialize() python.PyRun_SimpleString(“print ‘Hello, World!’”) python.Finalize()
  11. Demo time! Let’s run a Python module from a go

    application • https://github.com/masci/golab17/tree/master/01
  12. The dreadful GIL • Embedding CPython means embedding the GIL

    • Threads in the Python world are ok… • ...but we can’t invoke Python code in parallel! • Rule of thumb: any time you use Python in some piece of code that could be executed in a separate thread, lock the GIL!
  13. Demo time! Run Python in different goroutines and watch the

    world burn • https://github.com/masci/golab17/tree/master/02
  14. The dreadful GIL 2 • We lock and unlock the

    GIL in goroutines, not threads • The GIL protects a global thread state, we cannot lock/unlock it from different threads! • But the Go scheduler might pause and resume goroutines in different threads so goroutines must be locked to one OS thread :(
  15. Demo time! See what happens when the Go scheduler relocates

    our Pythonic goroutines (spoiler alert: it’ll crash your software) • https://github.com/masci/golab17/tree/master/03
  16. Beyond embedding: extending! • Once you have an embedded interpreter,

    you can extend Python capabilities with Go code • This involves a little bit of C so no demo here • Still very easy to achieve, Python scripts import a module that actually lives in memory and points to Go instructions
  17. Extending Python: the Go code //export MyGoFunc func MyGoFunc() {

    fmt.Println(“Hello, World!”) }
  18. Extending Python: the C code static PyMethodDef MyMethods[] = {

    {"my_func", (PyCFunction)my_go_func, METH_VARARGS, "YAY!"}, {NULL, NULL} // guards }; PyObject *m = Py_InitModule("my_module", MyMethods);
  19. Extending Python: the Python code # WARNING! This only works

    on the embedded interpreter import my_module my_module.my_func() # prints “Hello, World!”
  20. Lessons learned: the good • Embedded Python plays nice with

    Go concurrency model • From/To Python overhead is negligible ◦ BenchmarkCallPyFunc 300000 3606 ns/op • Extending Python is a very powerful tool ◦ Expose low level functions, configuration management, etc to the Python world
  21. Lessons learned: the bad • The GIL prevents Python parallel

    execution ◦ This was expected • The GIL also interferes with the Go scheduler ◦ Honestly, didn’t see this coming • Using multiple interpreters doesn’t help ◦ They share a unique GIL
  22. Lessons learned: the ugly • You must carry on some

    C code, how much depending on the use case • You will likely carry on some Python code too, to offer base classes and utilities to external modules running in embedded mode
  23. What we have now • Embedded CPython 2.7.12 • Linux

    and OSX, Windows on its way • We now run checks concurrently, knowing that many of them will wait for each other... • ...even if some of them were ported to Go, so we also have some parallelism • At the end of the day, we got a good deal
  24. Thanks for listening! • Try Datadog at https://www.datadoghq.com • Find

    our OSS on https://github.com/DataDog • Our tech blog is http://engineering.datadoghq.com The new Agent will be open sourced soon, stay tuned!
  25. WE’RE HIRING! NYC, PARIS, Remote