How to port your Python software to Go without
people noticing - a real story.
Massimiliano Pippi
Slide 2
Slide 2 text
● Software Engineer, Generalist
○ C++, Python and Go
● OSS fan and contributor
● 1 year at Datadog, working on Agent and Integrations
Hi, I’m Massi!
Slide 3
Slide 3 text
● SaaS based infrastructure and app monitoring
● Open Source Agent
● Time series data (metrics and events)
● Processing nearly a trillion data points per day
● Intelligent Alerting and Insightful Dashboards
About Datadog
Slide 4
Slide 4 text
Monitor everything
Slide 5
Slide 5 text
Meet the Datadog Agent
Agent
check
● Written in Python
● Open Source,
https://github.com/DataDog/dd-agent
Slide 6
Slide 6 text
The anatomy of a check
import psutil
from checks import AgentCheck
class SystemSwap(AgentCheck):
def check(self, instance):
swap_mem = psutil.swap_memory()
self.rate('system.swap.swapped_in', swap_mem.sin)
self.rate('system.swap.swapped_out', swap_mem.sout)
self.gauge(‘system.swap.total’, swap_mem.total)
self.gauge(‘system.swap.used’, swap_mem.used)
Slide 7
Slide 7 text
The way to Go - Our Goals
● Keep Python as an extension language
○ ~75 checks part of the official package
○ Undetermined number of custom checks in the wild
● RPC between the core and a self-standing
Python process is not desirable
● Stop serializing all checks in time
Slide 8
Slide 8 text
Embedding for the win
Cgo enables the creation of Go packages that call C code.
● CPython happens to be C code
● Embedding is a well documented in C: you
keep an interpreter in memory and make it run
Python code at will
Slide 9
Slide 9 text
Embedding: an example
// #cgo pkg-config: python-2.7
// #include
import "C"
C.Py_Initialize()
cmd := C.CString(“print ‘Hello, World!’”) // cmd must be freed!
C.PyRun_SimpleString(cmd)
C.Py_Finalize()
● cgo handles the build toolchain pretty nice, we
still do `go build` and that’s it.
Slide 10
Slide 10 text
Embedding: a better example
● Use go-python to eliminate boilerplate
○ https://github.com/sbinet/go-python
import "github.com/sbinet/go-python"
python.Initialize()
python.PyRun_SimpleString(“print ‘Hello, World!’”)
python.Finalize()
Slide 11
Slide 11 text
Demo time!
Let’s run a Python module from a go application
● https://github.com/masci/golab17/tree/master/01
Slide 12
Slide 12 text
The dreadful GIL
● Embedding CPython means embedding the GIL
● Threads in the Python world are ok…
● ...but we can’t invoke Python code in parallel!
● Rule of thumb: any time you use Python in
some piece of code that could be executed in a
separate thread, lock the GIL!
Slide 13
Slide 13 text
Demo time!
Run Python in different goroutines and watch the
world burn
● https://github.com/masci/golab17/tree/master/02
Slide 14
Slide 14 text
The dreadful GIL 2
● We lock and unlock the GIL in goroutines, not
threads
● The GIL protects a global thread state, we
cannot lock/unlock it from different threads!
● But the Go scheduler might pause and resume
goroutines in different threads so goroutines
must be locked to one OS thread :(
Slide 15
Slide 15 text
Demo time!
See what happens when the Go scheduler
relocates our Pythonic goroutines (spoiler alert:
it’ll crash your software)
● https://github.com/masci/golab17/tree/master/03
Slide 16
Slide 16 text
Beyond embedding: extending!
● Once you have an embedded interpreter, you
can extend Python capabilities with Go code
● This involves a little bit of C so no demo here
● Still very easy to achieve, Python scripts
import a module that actually lives in memory
and points to Go instructions
Slide 17
Slide 17 text
Extending Python: the Go code
//export MyGoFunc
func MyGoFunc() {
fmt.Println(“Hello, World!”)
}
Extending Python: the Python code
# WARNING! This only works on the embedded interpreter
import my_module
my_module.my_func() # prints “Hello, World!”
Slide 20
Slide 20 text
Lessons learned: the good
● Embedded Python plays nice with Go
concurrency model
● From/To Python overhead is negligible
○ BenchmarkCallPyFunc 300000 3606 ns/op
● Extending Python is a very powerful tool
○ Expose low level functions, configuration
management, etc to the Python world
Slide 21
Slide 21 text
Lessons learned: the bad
● The GIL prevents Python parallel execution
○ This was expected
● The GIL also interferes with the Go scheduler
○ Honestly, didn’t see this coming
● Using multiple interpreters doesn’t help
○ They share a unique GIL
Slide 22
Slide 22 text
Lessons learned: the ugly
● You must carry on some C code, how much
depending on the use case
● You will likely carry on some Python code too,
to offer base classes and utilities to external
modules running in embedded mode
Slide 23
Slide 23 text
What we have now
● Embedded CPython 2.7.12
● Linux and OSX, Windows on its way
● We now run checks concurrently, knowing that many
of them will wait for each other...
● ...even if some of them were ported to Go, so we also
have some parallelism
● At the end of the day, we got a good deal
Slide 24
Slide 24 text
Thanks for listening!
● Try Datadog at https://www.datadoghq.com
● Find our OSS on https://github.com/DataDog
● Our tech blog is http://engineering.datadoghq.com
The new Agent will be open sourced soon, stay tuned!