Slide 1

Slide 1 text

… how Python was shaped by leaky internals Armin Ronacher @mitsuhiko

Slide 2

Slide 2 text

Armin Ronacher @mitsuhiko Flask / Sentry / Lektor http://lucumr.pocoo.org/

Slide 3

Slide 3 text

ʮXIBUJTUIJTBCPVUʯ

Slide 4

Slide 4 text

• Python is an insanely complex language • You are being “lied” to in regards to how it works • People however depend on the little details • Which makes it very hard to evolve the language The Leaky Interpreter

Slide 5

Slide 5 text

ʮUIFMBOHVBHFZPVBSFUPMEʯ

Slide 6

Slide 6 text

MAGIC = 42 def add_magic(a): return a + MAGIC

Slide 7

Slide 7 text

MAGIC = 42 def add_magic(a): return a.__add__(MAGIC)

Slide 8

Slide 8 text

ʮUIFMBOHVBHFUIBUJTʯ

Slide 9

Slide 9 text

0 LOAD_GLOBAL 0 (MAGIC) 3 LOAD_FAST 0 (a) 6 BINARY_ADD 7 RETURN_VALUE

Slide 10

Slide 10 text

TARGET_NOARG(BINARY_ADD) { w = POP(); v = TOP(); if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) { … } else if (PyString_CheckExact(v) && PyString_CheckExact(w)) { … } else { x = PyNumber_Add(v, w); } Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) DISPATCH(); break; }

Slide 11

Slide 11 text

PyObject * PyNumber_Add(PyObject *v, PyObject *w) { PyObject *result = binary_op1(v, w, NB_SLOT(nb_add)); if (result == Py_NotImplemented) { PySequenceMethods *m = v->ob_type->tp_as_sequence; Py_DECREF(result); if (m && m->sq_concat) { return (*m->sq_concat)(v, w); } result = binop_type_error(v, w, "+"); } return result; }

Slide 12

Slide 12 text

static PyObject * binary_op1(PyObject *v, PyObject *w, const int op_slot) { PyObject *x; binaryfunc slotv = NULL, slotw = NULL; if (v->ob_type->tp_as_number != NULL) slotv = NB_BINOP(v->ob_type->tp_as_number, op_slot); if (w->ob_type != v->ob_type && w->ob_type->tp_as_number != NULL) { slotw = NB_BINOP(w->ob_type->tp_as_number, op_slot); if (slotw == slotv) slotw = NULL; } if (slotv) { if (slotw && PyType_IsSubtype(w->ob_type, v->ob_type)) { … } x = slotv(v, w); if (x != Py_NotImplemented) return x; Py_DECREF(x); /* can't do it */ } if (slotw) { … } Py_RETURN_NOTIMPLEMENTED; }

Slide 13

Slide 13 text

So where is __add__?

Slide 14

Slide 14 text

ʮTMPUT:-(ʯ

Slide 15

Slide 15 text

• Slots are struct members in the PyTypeObject • Each special method is wrapped and stored there • Foo.__add__ can be FooType.tp_as_number.nb_add What's a Slot?

Slide 16

Slide 16 text

• FooType.tp_as_number.nb_add • FooType.tp_as_sequence.nb_concat • Both correspond to a+b (~__add__) Weird Slots

Slide 17

Slide 17 text

ʮ&YQMBJOJOH0QFSBUPSTʯ

Slide 18

Slide 18 text

• a + b = a.__add__(b) • slightly more correct: type(a).__add__(b) • Both wrong though Tutorials

Slide 19

Slide 19 text

• are a and b integers? Then try fast add • are a and b strings? Then try fast concat • number addition: • does a implement number slots? resolve nb_add slot • does b implement number slots? resolve nb_add slot • based on type relationship use callback from a or b • sequence concatenation: • does a implement sequence slots? invoke sq_concat slot a + b

Slide 20

Slide 20 text

a.__add__(b) • Invoke attribute lookup flow on type(a) • Ask to look up the __add__ attribute • Invoke the return value of the lookup with b

Slide 21

Slide 21 text

• Depends on the type of the object • C types expose slot wrappers to Python • Python objects place Python functions in type slots How do they do similar things?

Slide 22

Slide 22 text

they are not equivalent!

Slide 23

Slide 23 text

ʮPOFMJLFUIFPUIFSʯ

Slide 24

Slide 24 text

Python Objects >>> class X(object): ... __add__ = lambda *x: 42 ... >>> X.__add__ >

Slide 25

Slide 25 text

C Objects >>> int.__add__

Slide 26

Slide 26 text

python tries to “sync” them up

Slide 27

Slide 27 text

ʮXIZEPXFDBSF ʯ

Slide 28

Slide 28 text

it's complex and canon

Slide 29

Slide 29 text

it makes optimizations impossible

Slide 30

Slide 30 text

PyPy needs to emulate all that

Slide 31

Slide 31 text

ʮJUTIBQFTUIFMBOHVBHFʯ

Slide 32

Slide 32 text

The C API Leaks Python 2.6.9 (unknown, Oct 23 2015, 19:19:20) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> x = re.compile('foo') >>> x.__class__ Traceback (most recent call last): File "", line 1, in AttributeError: __class__

Slide 33

Slide 33 text

Once Upon a Time >>> class X: ... def __getattr__(self, name): ... return getattr(42, name) ... >>> a = X() >>> a 42 >>> a + 23 65

Slide 34

Slide 34 text

so how did that work?

Slide 35

Slide 35 text

'instance' types forward all calls

Slide 36

Slide 36 text

ʮ6/*$0%&ʯ

Slide 37

Slide 37 text

UCS2 / UCS4 :'(

Slide 38

Slide 38 text

We guaranteed too much >>> u"foo"[0] u'f'

Slide 39

Slide 39 text

UCS2 / UCS4 :'(

Slide 40

Slide 40 text

ʮXIZEJEXFFOEVQIFSF ʯ

Slide 41

Slide 41 text

• C Types and Python Classes evolved side-by-side • Were later unified • Optimizations always shine through :-( • When it desyncs, it gets weird Two Pythons

Slide 42

Slide 42 text

ʮ'SBNFTBOE-PDBMTʯ

Slide 43

Slide 43 text

Interpreter Internals >>> import sys >>> sys._getframe().f_locals['foo'] = 42 >>> foo 42

Slide 44

Slide 44 text

• Zope Interface • warnings module • inspect • logging • Debug Support (also Sentry) • getframe and friends are everywhere Who uses getframe anyways

Slide 45

Slide 45 text

ʮTZTNPEVMFT ʯ

Slide 46

Slide 46 text

:'((( import sys def import_module(module): __import__(module) return sys.modules[module]

Slide 47

Slide 47 text

bad import API and pickle took away our chances of getting versioned modules

Slide 48

Slide 48 text

ʮTUBUJDUZQFTʯ

Slide 49

Slide 49 text

type vs class >>> int >>> class X(int): ... pass ... >>> X

Slide 50

Slide 50 text

Global Types PyTypeObject PyInt_Type = { PyVarObject_HEAD_INIT(&PyType_Type, 0) "int", sizeof(PyIntObject), 0, (destructor)int_dealloc, … int_new, (freefunc)int_free, };

Slide 51

Slide 51 text

C-Level Type Checks #define PyInt_CheckExact(op) \ ((op)->ob_type == &PyInt_Type)

Slide 52

Slide 52 text

ʮ$POTFRVFODFTʯ

Slide 53

Slide 53 text

getting rid of the GIL hard to modernize:

Slide 54

Slide 54 text

because all internals are exposed hard to change internals

Slide 55

Slide 55 text

no multi version libraries can't be node.js:

Slide 56

Slide 56 text

expose interpreter logic too much can't be fast:

Slide 57

Slide 57 text

refcounts everywhere and exposed hard to be concurrent:

Slide 58

Slide 58 text

static types are shared :( hard to be parallel:

Slide 59

Slide 59 text

to be fast the interpreter needs to cheat hard to be dynamic:

Slide 60

Slide 60 text

ʮ4IBQFE&YQFDUBUJPOTʯ

Slide 61

Slide 61 text

• Refcounting or similar behavior • Ability to access the interpreter state • Lots and lots of metaprogramming What Python Programmers Want

Slide 62

Slide 62 text

• PDB • ORMs • Zope Interface and friends • Many proxy objects • Manhole • Sentry :) The Quirks gave birth to

Slide 63

Slide 63 text

?