Slide 1

Slide 1 text

CPython bytecode evolution 2018 Dmitry Alimov

Slide 2

Slide 2 text

Bytecode According to https://docs.python.org/3/glossary.html#term-bytecode - Internal representation of a Python program in the CPython interpreter - Cached in .pyc files so that executing the same file is faster the second time - Run on a VM that executes the machine code corresponding to each bytecode

Slide 3

Slide 3 text

CPython source code to bytecode def foo(v): if not v: return 0 print(v) foo(123) compilation 64 00 64 01 84 00 5A 00 65 00 64 02 83 01 01 00 64 03 53 00 bytecode source code 1) Parse source code into a parse tree (Parser/pgen.c) 2) Transform parse tree (CST) to an AST (Python/ast.c) 3) Transform AST into a Control Flow Graph (CFG) (Python/compile.c) 4) Emit the bytecode based on the CFG (Python/compile.c) 5) Optimize the bytecode with peephole optimizations (Python/peephole.c)

Slide 4

Slide 4 text

CPython bytecode def foo(v): if not v: return 0 print(v) foo(123) .py file # CPython 2.7 >>> print(foo.__code__.co_code.encode('hex')) '7c0000730a00640100537c0000474864000053' # bytecode

Slide 5

Slide 5 text

Bytecode disassembling def foo(v): if not v: return 0 print(v) foo(123) .py file # CPython 2.7 >>> import dis >>> dis.dis(foo) 4 0 LOAD_FAST 0 (v) 3 POP_JUMP_IF_TRUE 10 5 6 LOAD_CONST 1 (0) 9 RETURN_VALUE 6 >> 10 LOAD_FAST 0 (v) 13 PRINT_ITEM 14 PRINT_NEWLINE 15 LOAD_CONST 0 (None) 18 RETURN_VALUE

Slide 6

Slide 6 text

Bytecode disassembling def foo(v): if not v: return 0 print(v) foo(123) .pyc file # CPython 3.6 >>> magic, ts, size = f.read(4), f.read(4), f.read(4) >>> code = marshal.load(f) >>> dis.disassemble(code) 1 0 LOAD_CONST 0 () 2 LOAD_CONST 1 ('foo') 4 MAKE_FUNCTION 0 6 STORE_NAME 0 (foo) 6 8 LOAD_NAME 0 (foo) 10 LOAD_CONST 2 (123) 12 CALL_FUNCTION 1 14 POP_TOP 16 LOAD_CONST 3 (None) 18 RETURN_VALUE 33 0D 0D 0A DA 80 F4 5A 48 00 00 00 E3 00 00 00 00 00 00 00 00 00 00 00 00 02 00 00 .py file

Slide 7

Slide 7 text

CPython implementation details https://docs.python.org/3/library/dis.html No guarantees are made that bytecode will not be added, removed, or changed between versions of Python https://docs.python.org/3/glossary.html#term-bytecode Bytecodes are not expected to work between different Python VMs, nor to be stable between Python releases

Slide 8

Slide 8 text

CPython < 3.6 Opcodes without an argument (only 1 byte) Opcodes with an argument have 2 bytes argument (3 bytes) 0x53 RETURN_VALUE 0x65 LOAD_NAME 5 ('val') 0x05 0x00 /Include/opcode.h: #define HAVE_ARGUMENT 90 // 0x5a #define HAS_ARG(op) ((op) >= HAVE_ARGUMENT)

Slide 9

Slide 9 text

CPython 3.6: Wordcode WPython project by Cesare Di Mauro https://bugs.python.org/issue26647 “ceval: use Wordcode, 16-bit bytecode” Contributed by Demur Rumed, with input and reviews from Serhiy Storchaka and Victor Stinner

Slide 10

Slide 10 text

CPython 3.6: Wordcode Every instruction has an argument (opcode + arg = 2 bytes), but opcodes < HAVE_ARGUMENT (0x5a) ignore it 0x53 RETURN_VALUE 0x65 LOAD_NAME 5 ('val') 0x05 0x00 https://bugs.python.org/msg266417 Faster (27): up to 11% faster Slower (1): the worst slowdown is only 7% Not significant (14)

Slide 11

Slide 11 text

Ideas for research In CPython < 3.6: When an opcodes argument > 65535 In CPython >= 3.6: When an opcodes argument > 255 When an opcodes argument > 65535 When an opcodes argument > 16777215

Slide 12

Slide 12 text

Let’s test bytecode # comment for profiling import dis def a(): a0 = 0 a1 = 1 a2 = 2 a3 = 3 ... a65535 = 65535 a() # comment for profiling dis.dis(a) $ ls -la -rw-r--r-- 1 user users 1223050 May 7 22:29 test_pybc.py $ python2.7 -m compileall test_pybc.py $ mv test_pybc.pyc test_pybc27.pyc $ ls -la -rw-r--r-- 1 user users 1562519 May 7 22:32 test_pybc27.pyc $ python3.5 -m compileall test_pybc.py $ mv __pycache__/test_pybc.cpython-35.pyc test_pybc35.pyc $ ls -la -rw-r--r-- 1 user users 1365898 May 7 22:32 test_pybc35.pyc $ python3.6 -m compileall test_pybc.py $ mv __pycache__/test_pybc.cpython-36.pyc test_pybc36.pyc $ ls -la -rw-r--r-- 1 user users 1496453 May 7 22:32 test_pybc36.pyc

Slide 13

Slide 13 text

EXTENDED_ARG opcode https://github.com/python/cpython/blob/2.7/Include/opcode.h #define EXTENDED_ARG 145 // 0x91 https://github.com/python/cpython/blob/3.0/Include/opcode.h #define EXTENDED_ARG 143 // 0x8f (new value!) https://github.com/python/cpython/blob/3.2/Include/opcode.h #define EXTENDED_ARG 144 // 0x90 (new value!) https://github.com/python/cpython/blob/3.7/Include/opcode.h #define EXTENDED_ARG 144 // 0x90

Slide 14

Slide 14 text

Eval in CPython 2.7–3.5 https://github.com/python/cpython/blob/3.5/Python/ceval.c#L797 PyObject * PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) { ... unsigned char *next_instr; int opcode; /* Current opcode */ int oparg; /* Current opcode argument, if any */ ... for (;;) { /* Extract opcode and argument */ opcode = NEXTOP(); oparg = 0; if (HAS_ARG(opcode)) oparg = NEXTARG(); dispatch_opcode: ... switch (opcode) { ... TARGET(EXTENDED_ARG) { opcode = NEXTOP(); oparg = oparg<<16 | NEXTARG(); goto dispatch_opcode; } ...

Slide 15

Slide 15 text

Eval in CPython 3.6 https://github.com/python/cpython/blob/3.6/Python/ceval.c#L758 PyObject * _PyEval_EvalFrameDefault(PyFrameObject *f, int throwflag) { ... const _Py_CODEUNIT *next_instr; int opcode; /* Current opcode */ int oparg; /* Current opcode argument, if any */ ... for (;;) { /* Extract opcode and argument */ NEXTOPARG(); dispatch_opcode: ... switch (opcode) { ... TARGET(EXTENDED_ARG) { int oldoparg = oparg; NEXTOPARG(); oparg |= oldoparg << 8; goto dispatch_opcode; } ...

Slide 16

Slide 16 text

CPython 2.7–3.5 5 0 LOAD_CONST 1 (0) 3 STORE_FAST 0 (a0) ... 260 1530 LOAD_CONST 256 (255) 1533 STORE_FAST 255 (a255) ... 517 3072 LOAD_CONST 513 (512) 3075 STORE_FAST 512 (a512) ... 65539 393204 LOAD_CONST 65535 (65534) 393207 STORE_FAST 65534 (a65534) 65540 393210 EXTENDED_ARG 1 393213 LOAD_CONST 65536 (65535) 393216 STORE_FAST 65535 (a65535) Full bytecode size = 393219 Opcodes number = 131073

Slide 17

Slide 17 text

CPython 2.7–3.5 393204 LOAD_CONST 65535 (65534) 393207 STORE_FAST 65534 (a65534) 393210 EXTENDED_ARG 1 393213 LOAD_CONST 65536 (65535) 393216 STORE_FAST 65535 (a65535) arg = (1 << 16) | 0 = 65536 | 0 = 65536 91 0100 64 0000 7d ffff 64 ffff 7d feff TARGET(EXTENDED_ARG) { opcode = NEXTOP(); oparg = oparg << 16 | NEXTARG(); goto dispatch_opcode; }

Slide 18

Slide 18 text

CPython 3.6 5 0 LOAD_CONST 1 (0) 2 STORE_FAST 0 (a0) ... 260 1020 EXTENDED_ARG 1 1022 LOAD_CONST 256 (255) 1024 STORE_FAST 255 (a255) ... 517 3074 EXTENDED_ARG 2 3076 LOAD_CONST 513 (512) 3078 EXTENDED_ARG 2 3080 STORE_FAST 512 (a512) ... 65540 523258 EXTENDED_ARG 1 523260 EXTENDED_ARG 256 523262 LOAD_CONST 65536 (65535) 523264 EXTENDED_ARG 255 523266 STORE_FAST 65535 (a65535) Full bytecode size = 523268 Opcodes number = 261634 Starting with line 517 bytecode size in CPython 3.6 is more than in CPython < 3.6 (3074 > 3072)

Slide 19

Slide 19 text

CPython 3.6 1020 EXTENDED_ARG 1 1022 LOAD_CONST 256 (255) 1024 STORE_FAST 255 (a255) ... 523258 EXTENDED_ARG 1 523260 EXTENDED_ARG 256 523262 LOAD_CONST 65536 (65535) 523264 EXTENDED_ARG 255 523266 STORE_FAST 65535 (a65535) arg = (1 << 8) | 0 = 256 | 0 = 256 TARGET(EXTENDED_ARG) { int oldoparg = oparg; NEXTOPARG(); // get new oparg oparg = oparg | (oldoparg << 8); goto dispatch_opcode; 90 01 90 00 64 00 90 ff 7d ff arg = (1 << 8) | 0 = 256 | 0 = 256 arg = (256 << 8) | 0 = 65536 | 0 = 65536 arg = (255 << 8) | 255 = 65280 | 255 = 65535 90 01 64 00 7d ff

Slide 20

Slide 20 text

Results $ time python3.6 test_pybc.py real 0m0.698s (10-20% faster than 2.7) user 0m0.640s sys 0m0.060s $ time python3.6 test_pybc36.pyc real 0m0.094s (15-30% slower than 2.7) user 0m0.082s sys 0m0.012s $ time python2.7 test_pybc.py real 0m0.773s user 0m0.672s sys 0m0.096s $ time python2.7 test_pybc27.pyc real 0m0.077s user 0m0.068s sys 0m0.012s CPython 2.7 CPython 3.6 131073 opcodes bytecode 393219 bytes test_pybc27.pyc size = 1562519 261634 opcodes (130561 more) bytecode 523268 bytes (130049 more) test_pybc36.pyc size = 1496453

Slide 21

Slide 21 text

Results $ python2.7 test_pybc_time.py 1.11467504501 $ python3.5 test_pybc_time.py 0.8957379780185875 (20-30% faster!!!) $ python3.6 test_pybc_time.py 1.3842481109895743 (20-30% slower) def a(): a0 = 0 ... a65535 = 65535 import timeit print(timeit.timeit(a, number=1000)) $ time python3.5 test_pybc.py real 0m0.955s (20-35% slower than 2.7) user 0m0.892s sys 0m0.064s $ time python3.5 test_pybc.pyc real 0m0.100s (20-35% slower than 2.7) user 0m0.084s sys 0m0.016s But:

Slide 22

Slide 22 text

Bug in CPython docs >= 3.6 https://docs.python.org/3.6/library/dis.html#opcode-EXTENDED_ARG EXTENDED_ARG(ext) Prefixes any opcode which has an argument too big to fit into the default two bytes. ext holds two additional bytes which, taken together with the subsequent opcode’s argument, comprise a four-byte argument, ext being the two most-significant bytes. https://bugs.python.org/issue32625

Slide 23

Slide 23 text

Questions https://t.me/spbpython https://t.me/piterpy_meetup 23

Slide 24

Slide 24 text

Links https://devguide.python.org/compiler/ https://github.com/python/cpython/ https://docs.python.org/ https://bugs.python.org/ https://code.google.com/archive/p/wpython2/ https://stupidpythonideas.blogspot.com/2016/02/title.html 24

Slide 25

Slide 25 text

Bonus slides

Slide 26

Slide 26 text

Marshal PyCodeObject in CPython 2.7 https://github.com/python/cpython/blob/2.7/Python/marshal.c PyCodeObject *co = (PyCodeObject *)v; w_byte(TYPE_CODE, p); w_long(co->co_argcount, p); w_long(co->co_nlocals, p); w_long(co->co_stacksize, p); w_long(co->co_flags, p); w_object(co->co_code, p); w_object(co->co_consts, p); w_object(co->co_names, p); w_object(co->co_varnames, p); w_object(co->co_freevars, p); w_object(co->co_cellvars, p); w_object(co->co_filename, p); w_object(co->co_name, p); w_long(co->co_firstlineno, p); w_object(co->co_lnotab, p);

Slide 27

Slide 27 text

Marshal PyCodeObject in CPython >=3.6 https://github.com/python/cpython/blob/3.6/Python/marshal.c PyCodeObject *co = (PyCodeObject *)v; W_TYPE(TYPE_CODE, p); w_long(co->co_argcount, p); w_long(co->co_kwonlyargcount, p); w_long(co->co_nlocals, p); w_long(co->co_stacksize, p); w_long(co->co_flags, p); w_object(co->co_code, p); w_object(co->co_consts, p); w_object(co->co_names, p); w_object(co->co_varnames, p); w_object(co->co_freevars, p); w_object(co->co_cellvars, p); w_object(co->co_filename, p); w_object(co->co_name, p); w_long(co->co_firstlineno, p); w_object(co->co_lnotab, p);

Slide 28

Slide 28 text

Opcode and oparg read macros CPython < 3.6 https://github.com/python/cpython/blob/3.5/Python/ceval.c#L998 #define NEXTOP() (*next_instr++) #define NEXTARG() (next_instr += 2, (next_instr[-1]<<8) + next_instr[-2]) CPython >= 3.6 https://github.com/python/cpython/blob/3.6/Include/code.h#L10 typedef uint16_t _Py_CODEUNIT; #define _Py_OPCODE(word) ((word) >> 8) #define _Py_OPARG(word) ((word) & 255) https://github.com/python/cpython/blob/3.6/Python/ceval.c#L905 #define NEXTOPARG() do { \ _Py_CODEUNIT word = *next_instr; \ opcode = _Py_OPCODE(word); \ oparg = _Py_OPARG(word); \ next_instr++; \ } while (0)