Slide 1

Slide 1 text

How to port your Python software to Go without people noticing - a real story. Massimiliano Pippi

Slide 2

Slide 2 text

● Software Engineer, Generalist ○ C++, Python and Go ● OSS fan and contributor ● 1 year at Datadog, working on Agent and Integrations Hi, I’m Massi!

Slide 3

Slide 3 text

● SaaS based infrastructure and app monitoring ● Open Source Agent ● Time series data (metrics and events) ● Processing nearly a trillion data points per day ● Intelligent Alerting and Insightful Dashboards About Datadog

Slide 4

Slide 4 text

Monitor everything

Slide 5

Slide 5 text

Meet the Datadog Agent Agent check ● Written in Python ● Open Source, https://github.com/DataDog/dd-agent

Slide 6

Slide 6 text

The anatomy of a check import psutil from checks import AgentCheck class SystemSwap(AgentCheck): def check(self, instance): swap_mem = psutil.swap_memory() self.rate('system.swap.swapped_in', swap_mem.sin) self.rate('system.swap.swapped_out', swap_mem.sout) self.gauge(‘system.swap.total’, swap_mem.total) self.gauge(‘system.swap.used’, swap_mem.used)

Slide 7

Slide 7 text

The way to Go - Our Goals ● Keep Python as an extension language ○ ~75 checks part of the official package ○ Undetermined number of custom checks in the wild ● RPC between the core and a self-standing Python process is not desirable ● Stop serializing all checks in time

Slide 8

Slide 8 text

Embedding for the win Cgo enables the creation of Go packages that call C code. ● CPython happens to be C code ● Embedding is a well documented in C: you keep an interpreter in memory and make it run Python code at will

Slide 9

Slide 9 text

Embedding: an example // #cgo pkg-config: python-2.7 // #include import "C" C.Py_Initialize() cmd := C.CString(“print ‘Hello, World!’”) // cmd must be freed! C.PyRun_SimpleString(cmd) C.Py_Finalize() ● cgo handles the build toolchain pretty nice, we still do `go build` and that’s it.

Slide 10

Slide 10 text

Embedding: a better example ● Use go-python to eliminate boilerplate ○ https://github.com/sbinet/go-python import "github.com/sbinet/go-python" python.Initialize() python.PyRun_SimpleString(“print ‘Hello, World!’”) python.Finalize()

Slide 11

Slide 11 text

Demo time! Let’s run a Python module from a go application ● https://github.com/masci/golab17/tree/master/01

Slide 12

Slide 12 text

The dreadful GIL ● Embedding CPython means embedding the GIL ● Threads in the Python world are ok… ● ...but we can’t invoke Python code in parallel! ● Rule of thumb: any time you use Python in some piece of code that could be executed in a separate thread, lock the GIL!

Slide 13

Slide 13 text

Demo time! Run Python in different goroutines and watch the world burn ● https://github.com/masci/golab17/tree/master/02

Slide 14

Slide 14 text

The dreadful GIL 2 ● We lock and unlock the GIL in goroutines, not threads ● The GIL protects a global thread state, we cannot lock/unlock it from different threads! ● But the Go scheduler might pause and resume goroutines in different threads so goroutines must be locked to one OS thread :(

Slide 15

Slide 15 text

Demo time! See what happens when the Go scheduler relocates our Pythonic goroutines (spoiler alert: it’ll crash your software) ● https://github.com/masci/golab17/tree/master/03

Slide 16

Slide 16 text

Beyond embedding: extending! ● Once you have an embedded interpreter, you can extend Python capabilities with Go code ● This involves a little bit of C so no demo here ● Still very easy to achieve, Python scripts import a module that actually lives in memory and points to Go instructions

Slide 17

Slide 17 text

Extending Python: the Go code //export MyGoFunc func MyGoFunc() { fmt.Println(“Hello, World!”) }

Slide 18

Slide 18 text

Extending Python: the C code static PyMethodDef MyMethods[] = { {"my_func", (PyCFunction)my_go_func, METH_VARARGS, "YAY!"}, {NULL, NULL} // guards }; PyObject *m = Py_InitModule("my_module", MyMethods);

Slide 19

Slide 19 text

Extending Python: the Python code # WARNING! This only works on the embedded interpreter import my_module my_module.my_func() # prints “Hello, World!”

Slide 20

Slide 20 text

Lessons learned: the good ● Embedded Python plays nice with Go concurrency model ● From/To Python overhead is negligible ○ BenchmarkCallPyFunc 300000 3606 ns/op ● Extending Python is a very powerful tool ○ Expose low level functions, configuration management, etc to the Python world

Slide 21

Slide 21 text

Lessons learned: the bad ● The GIL prevents Python parallel execution ○ This was expected ● The GIL also interferes with the Go scheduler ○ Honestly, didn’t see this coming ● Using multiple interpreters doesn’t help ○ They share a unique GIL

Slide 22

Slide 22 text

Lessons learned: the ugly ● You must carry on some C code, how much depending on the use case ● You will likely carry on some Python code too, to offer base classes and utilities to external modules running in embedded mode

Slide 23

Slide 23 text

What we have now ● Embedded CPython 2.7.12 ● Linux and OSX, Windows on its way ● We now run checks concurrently, knowing that many of them will wait for each other... ● ...even if some of them were ported to Go, so we also have some parallelism ● At the end of the day, we got a good deal

Slide 24

Slide 24 text

Thanks for listening! ● Try Datadog at https://www.datadoghq.com ● Find our OSS on https://github.com/DataDog ● Our tech blog is http://engineering.datadoghq.com The new Agent will be open sourced soon, stay tuned!

Slide 25

Slide 25 text

WE’RE HIRING! NYC, PARIS, Remote