Slide 1

Slide 1 text

Byterun: A (C)Python interpreter in Python Allison Kaptur github.com/akaptur akaptur.com @akaptur

Slide 2

Slide 2 text

Byterun with Ned Batchelder Based on # pyvm2 by Paul Swartz (z3p) from http://www.twistedmatrix.com/users/ z3p/

Slide 3

Slide 3 text

“Interpreter”

Slide 4

Slide 4 text

1. Lexing 2. Parsing 3. Compiling 4. Interpreting

Slide 5

Slide 5 text

The Python virtual machine: A bytecode interpreter

Slide 6

Slide 6 text

Bytecode: the internal representation of a python program in the interpreter

Slide 7

Slide 7 text

Why write an interpreter? >>> if a or b: ... pass

Slide 8

Slide 8 text

Testing def test_for_loop(self): self.assert_ok("""\ out = "" for i in range(5): out = out + str(i) print(out) """)

Slide 9

Slide 9 text

A problem def test_for_loop(self): self.assert_ok("""\ g = (x*x for x in range(5)) h = (y+1 for y in g) print(list(h)) """)

Slide 10

Slide 10 text

A simple VM - LOAD_VALUE - ADD_TWO_VALUES - PRINT_ANSWER

Slide 11

Slide 11 text

A simple VM "7 + 5" ["LOAD_VALUE", "LOAD_VALUE", "ADD_TWO_VALUES", "PRINT_ANSWER"]

Slide 12

Slide 12 text

7 5 12 Before After ADD_TWO_ VALUES After LOAD_ VALUE A simple VM After PRINT_ ANSWER

Slide 13

Slide 13 text

A simple VM what_to_execute = { "instructions": [("LOAD_VALUE", 0), ("LOAD_VALUE", 1), ("ADD_TWO_VALUES", None), ("PRINT_ANSWER", None)], "numbers": [7, 5] }

Slide 14

Slide 14 text

class Interpreter(object): def __init__(self): self.stack = [] def value_loader(self, number): self.stack.append(number) def answer_printer(self): answer = self.stack.pop() print(answer) def two_value_adder(self): first_num = self.stack.pop() second_num = self.stack.pop() total = first_num + second_num self.stack.append(total)

Slide 15

Slide 15 text

def run_code(self, what_to_execute): instrs = what_to_execute["instructions"] numbers = what_to_execute["numbers"] for each_step in instrs: instruction, argument = each_step if instruction == "LOAD_VALUE": number = numbers[argument] self.value_loader(number) elif instruction == "ADD_TWO_VALUES": self.two_value_adder() elif instruction == "PRINT_ANSWER": self.answer_printer() interpreter = Interpreter() interpreter.run_code(what_to_execute) # 12

Slide 16

Slide 16 text

Bytecode: it’s bytes! >>> def mod(a, b): ... ans = a % b ... return ans

Slide 17

Slide 17 text

Bytecode: it’s bytes! Function Code object Bytecode >>> def mod(a, b): ... ans = a % b ... return ans >>> mod.func_code.co_code

Slide 18

Slide 18 text

Bytecode: it’s bytes! >>> def mod(a, b): ... ans = a % b ... return ans >>> mod.func_code.co_code '|\x00\x00| \x01\x00\x16}\x02\x00|\x02\x00S'

Slide 19

Slide 19 text

Bytecode: it’s bytes! >>> def mod(a, b): ... ans = a % b ... return ans >>> mod.func_code.co_code ‘|\x00\x00| \x01\x00\x16}\x02\x00|\x02\x00S' >>> [ord(b) for b in mod.func_code.co_code] [124, 0, 0, 124, 1, 0, 22, 125, 2, 0, 124, 2, 0, 83]

Slide 20

Slide 20 text

dis, a bytecode disassembler >>> import dis >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Slide 21

Slide 21 text

dis, a bytecode disassembler >>> dis.dis(mod) line ind name arg hint 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Slide 22

Slide 22 text

Bytecode: it’s bytes! >>> def mod(a, b): ... ans = a % b ... return ans >>> mod(7,5)

Slide 23

Slide 23 text

7 5 2 Before After BINARY_ MODULO After LOAD_ FAST The Python interpreter After STORE_ FAST

Slide 24

Slide 24 text

>>> def mod(a, b): ... ans = a % b ... return ans >>> mod(7,5) >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Slide 25

Slide 25 text

c data stack -> a l l s t a c k Frame: main Frame: mod 7 5

Slide 26

Slide 26 text

Frame: main Frame: fact >>> def fact(n): ... if n < 2: return 1 ... else: return n * fact(n-1) >>> fact(3) 3 3 fact 1

Slide 27

Slide 27 text

Frame: main Frame: fact >>> def fact(n): ... if n < 2: return 1 ... else: return n * fact(n-1) >>> fact(3) 3 2 fact

Slide 28

Slide 28 text

Frame: main Frame: fact >>> def fact(n): ... if n < 2: return 1 ... else: return n * fact(n-1) >>> fact(3) 3 Frame: fact 2

Slide 29

Slide 29 text

Frame: main Frame: fact 3 Frame: fact 2 Frame: fact 1

Slide 30

Slide 30 text

Frame: main Frame: fact 3 Frame: fact 2 1 >>> def fact(n): ... if n < 2: return 1 ... else: return n * fact(n-1) >>> fact(3)

Slide 31

Slide 31 text

Frame: main Frame: fact 3 2 >>> def fact(n): ... if n < 2: return 1 ... else: return n * fact(n-1) >>> fact(3)

Slide 32

Slide 32 text

Frame: main Frame: fact 6 >>> def fact(n): ... if n < 2: return 1 ... else: return n * fact(n-1) >>> fact(3)

Slide 33

Slide 33 text

Frame: main Frame: fact 6 >>> def fact(n): ... if n < 2: return 1 ... else: return n * fact(n-1) >>> fact(3)

Slide 34

Slide 34 text

Python VM: - A collection of frames - Data stacks on frames - A way to run frames

Slide 35

Slide 35 text

>>> import dis >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE Instructions we need

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

} /*switch*/ /* Main switch on opcode */ READ_TIMESTAMP(inst0); switch (opcode) {

Slide 38

Slide 38 text

#ifdef CASE_TOO_BIG default: switch (opcode) { #endif /* Turn this on if your compiler chokes on the big switch: */ /* #define CASE_TOO_BIG 1 */

Slide 39

Slide 39 text

Instructions we need >>> import dis >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Slide 40

Slide 40 text

case LOAD_FAST: x = GETLOCAL(oparg); if (x != NULL) { Py_INCREF(x); PUSH(x); goto fast_next_opcode; } format_exc_check_arg(PyExc_UnboundLocalError, UNBOUNDLOCAL_ERROR_MSG, PyTuple_GetItem(co->co_varnames, oparg)); break;

Slide 41

Slide 41 text

case BINARY_MODULO: w = POP(); v = TOP(); if (PyString_CheckExact(v)) x = PyString_Format(v, w); else x = PyNumber_Remainder(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) continue; break;

Slide 42

Slide 42 text

Back to our problem g = (x*x for x in range(5)) h = (y+1 for y in g) print(list(h))

Slide 43

Slide 43 text

It’s “dynamic” >>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3

Slide 44

Slide 44 text

“Dynamic” >>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3 >>> mod(“%s%s”, (“Py”, “Con”))

Slide 45

Slide 45 text

“Dynamic” >>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3 >>> mod(“%s%s”, (“Py”, “Con”)) PyCon

Slide 46

Slide 46 text

“Dynamic” >>> def mod(a, b): ... ans = a % b ... return ans >>> mod(15, 4) 3 >>> mod(“%s%s”, (“Py”, “Con”)) PyCon >>> print “%s%s” % (“Py”, “Con”) PyCon

Slide 47

Slide 47 text

dis, a bytecode disassembler >>> import dis >>> dis.dis(mod) 2 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) 6 BINARY_MODULO 7 STORE_FAST 2 (ans) 3 10 LOAD_FAST 2 (ans) 13 RETURN_VALUE

Slide 48

Slide 48 text

case BINARY_MODULO: w = POP(); v = TOP(); if (PyString_CheckExact(v)) x = PyString_Format(v, w); else x = PyNumber_Remainder(v, w); Py_DECREF(v); Py_DECREF(w); SET_TOP(x); if (x != NULL) continue; break;

Slide 49

Slide 49 text

>>> class Surprising(object): … def __mod__(self, other): … print “Surprise!” >>> s = Surprising() >>> t = Surprsing() >>> s % t Surprise!

Slide 50

Slide 50 text

“In the general absence of type information, almost every instruction must be treated as INVOKE_ARBITRARY_METHOD.” - Russell Power and Alex Rubinsteyn, “How Fast Can We Make Interpreted Python?”

Slide 51

Slide 51 text

More Great blogs http://tech.blog.aknin.name/category/my- projects/pythons-innards/ by @aknin http://eli.thegreenplace.net/ by Eli Bendersky Contribute! Find bugs! https://github.com/nedbat/byterun Apply to the Recurse Center! www.recurse.com/apply